Processing of sound data encoded in a sub-band domain

ABSTRACT

Processing of sound data encoded in a sub-band domain, for dual-channel playback of binaural or Transaural® type is provided, in which a matrix filtering is applied so as to pass from a sound representation with N channels with N&gt;0, to a dual-channel representation. This sound representation with N channels comprises considering N virtual loudspeakers surrounding the head of a listener, and, for each virtual loudspeaker of at least some of the loudspeakers: a first transfer function specific to an ipsilateral path from the loudspeaker to a first ear of the listener, facing the loudspeaker, and a second transfer function specific to a contralateral path from said loudspeaker to the second ear of the listener, masked from the loudspeaker by the listener&#39;s head. The matrix filtering comprises a multiplicative coefficient defined by the spectrum, in the sub-band domain, of the second transfer function deconvolved with the first transfer function.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is the U.S. national phase of the International PatentApplication No. PCT/FR2010/052119 filed Oct. 8, 2010, which claims thebenefit of French Application No. 09 57118 filed Oct. 12, 2009, theentire content of which is incorporated herein by reference.

FIELD OF THE INVENTION

The invention relates to a processing of sound data.

In the context of the processing of sound data in a multichannel format(5.1 or more), it is sought to achieve a 3D spatialization effect called“Virtual Surround”. Such processing procedures involve filters which areaimed at reproducing a sound field at the inputs of a person's auditorycanals.

BACKGROUND

Indeed, a listener is capable of locating sounds in space with a certainprecision, by virtue of the perception of sounds by his two ears. Thesignals emitted by the sound sources undergo acoustic transformationswhile propagating up to the ears. These acoustic transformations arecharacteristic of the acoustic channel that becomes established betweena sound source and a point of the individual's auditory canal. Each earpossesses its own acoustic channel, and these acoustic channels dependon the position and the orientation of the source in relation to thelistener, the shape of the head and the ear of the listener, and alsothe acoustic environment (for example reverberation due to a halleffect). These acoustic channels may be modeled by filters commonlycalled “Head Impulse Responses” or HRIR (for “Head Related ImpulseResponses”), or else “Head transfer functions” or HRTF (“Head RelatedTransfer Functions”) depending on whether a representation thereof isgiven in the time domain or frequency domain respectively.

With reference to FIG. 1 has been represented a “direct” pathway CD froma source HP1 to the (left) ear OG of the listener AU (viewed fromabove), this ear OG being situated directly facing the source HP1. Alsorepresented is a “cross” pathway CC between a source HP2 and this sameear OG of the listener AU, the pathway CC passing through the head TETof the listener AU since the source HP2 is disposed on the other side ofthe mid-plane P with respect to the source HP2.

In an environment without reverberation (for example an anechoicchamber), considering that human faces are symmetric, the HRTF functionsfor the left ear and for the right ear (termed respectively “left HRTF”and “right HRTF” hereinafter) are identical for the sources which aresituated in the mid-plane (plane P which separates the left half fromthe right half of the body as illustrated in FIG. 2). The acousticindices utilized by the brain to locate the sounds are often classedinto two families of indices:

-   -   so-called “monaural” indices relating to the locating of a sound        on the basis of a single ear, and    -   so-called “interaural” indices relating to the locating of a        sound by the brain by utilizing the differences between the        signals perceived by the left ear and the right ear.

Known techniques for processing sound data in multi-channel format (forexample with more than two loudspeakers) with a view to playback on twoloudspeakers only, for example on a headset with a 3D spatializationeffect, are described hereinafter.

The term “binaural playback” is then understood to denote listening on aheadset to audio contents initially in the multi-channel format (forexample in the 5.1 format, or other formats delivering more than twotracks), these audio contents being processed in particular with mixingof the channels so as to deliver only two signals feeding, in theso-called “binaural” configuration, the two mini loudspeakers (or“earpieces”) of a conventional stereophonic headset). Thus, in thetransformation from a “multi-channel” format to a “binaural” format, itis sought to offer quality of spatialization and immersion to theheadset similar or equivalent to that obtained with a multi-channelplayback system comprising as many remote loudspeakers as channels.Furthermore, the term “Transaural® playback” is understood to denotelistening on two remote loudspeakers to audio contents initially in amulti-channel format.

Conventionally, for listening to an audio content in the 5.1multi-channel format on a stereophonic headset or on a pair ofloudspeakers, a matrixing of the channels, hereinafter called“sub-mixing” or “Downmix”, is performed. A “Downmix” processing is amatrix processing which makes it possible to pass from N channels to Mchannels with N>M. It will be considered hereinafter that a “Downmix”processing (provided that it does not take account of spatializationeffects) does not involve any filter based on HRTF functions. Ingeneral, the matrices of the “Downmix” processing used in sound playbackdevices (PC computer, DVD player, television, or the like) have constantcoefficients which depend neither on time nor frequency. Recent“Downmix” processing procedures now exhibit matrices whose coefficientsdepend on time and frequency and are adjusted at each instant as afunction of a time and frequency representation of the input signals.This type of matrix makes it possible for example to prevent the inputsignals from cancelling one another out by adding together. Aconstant-matrix version of a processing of “Downmix” type, termed“Downmix ITU”, has been standardized by the InternationalTelecommunications Union “ITU”. This processing is applied byimplementing the following equations:S _(G) =E _(AVG) +E _(c)*0.707+E _(ARG)*0.707S _(R) =E _(AVD) +E _(c)*0.707+E _(ARD)*0.707,

where:

-   -   S_(G) and S_(R) are respectively left and right output stereo        signals,    -   E_(AVG) and E_(AVD) are respectively input signals which would        have been intended to feed left AVG and right AVD lateral        loudspeakers (illustrated in FIG. 2),    -   E_(ARG) and E_(ARD) are respectively input signals which would        have been intended to feed rear left ARG and rear right ARD        loudspeakers, situated behind the listener AU of FIG. 2,    -   E_(C) is an input signal which would have been intended to feed        a central loudspeaker C situated facing the listener AU, and    -   0.707 represents an approximation of the square root of ½.

It is possible to consider such gains as gains applied to theloudspeakers.

By way of example, the processing hereinafter termed “Downmix ITU” doesnot allow the accurate spatial perception of sound events. As indicatedpreviously furthermore, a processing of “Downmix” type, generally, doesnot allow spatial perception since it does not involve any HRTF filter.The feeling of immersion that the contents can offer in themulti-channel format is then lost with headset listening with respect tolistening on a system with more than two loudspeakers (for example inthe 5.1 format as illustrated in FIG. 2). By way of example, a soundassumed to be emitted by a mobile source from the front to the rear ofthe listener, is not played back correctly on a stereo-only system (on aheadset with earpieces or a pair of loudspeakers). Furthermore, a soundpresent solely in the channel S_(G) (or S_(R)) and processed by the“Downmix ITU” sub-mixing is played back only in the left (or right,respectively) earpiece in the case of headset listening, whereas in thecase of listening on a system with more than two loudspeakers (forexample in the 5.1 format), the right (or left, respectively) ear alsoperceives a signal by diffraction.

In order to alleviate these drawbacks, the method of sub-mixing to abinaural format, termed “Binaural downmix”, has been developed. Itconsists in placing virtually five (or more) loudspeakers in a soundenvironment played back on two tracks only, as if five sources (or more)were to be spatialized for binaural playback. Thus, a content in themulti-channel format is broadcast on “virtual” loudspeakers in a contextof binaural playback. The uses of such a technique currently lie mainlyin DVD players (on PC computers, on televisions, on living-room DVDplayers, or the like), and soon on mobile terminals for playingtelevisual or video data.

In the “Binaural downmix” method, the virtual loudspeakers are createdby the so-called “binaural synthesis” technique. This technique consistsin applying head acoustic transfer functions (HRTF), to monophonic audiosignals, so as to obtain a binaural signal which makes it possible,during headset listening, to have the sensation that the sound sourcesoriginate from a particular direction in space. The signal of the rightear is obtained by filtering the monophonic signal with the HRTFfunction of the right ear and the signal of the left ear is obtained byfiltering this same monophonic signal with the HRTF function of the leftear. The resulting binaural signal is then available for headsetlistening.

This implementation is illustrated in FIG. 3A. A transfer functiondefined by a filter is associated with each acoustic pathway between anear of the listener and a virtual loudspeaker (placed as advocated inthe 5.1 multi-channel format in the example represented). Thus, withreference to FIG. 3B, for ten acoustic pathways in all:

-   -   HCg (respectively HCd) is the filter corresponding to an HRTF        for the pathway between the central loudspeaker C and the left        OG (respectively right OD) ear of the listener,    -   HGg (respectively HDd) is the filter corresponding to a        so-called “ipsilateral” HRTF (ear “illuminated” by the        loudspeaker) for the direct pathway (solid line) between the        left lateral AVG (respectively right lateral AVD) loudspeaker        and the left OG (respectively right OD) ear of the listener,    -   HGd (respectively HDg) is the filter corresponding to a        so-called “contralateral” HRTF (ear in “the shadow” of the head)        for the indirect pathway (dashed lines) between the left lateral        AVG (respectively right lateral AVD) loudspeaker and the right        OD (respectively left OG) ear of the listener,    -   HGSg (respectively HDSd) is the filter corresponding to an        ipsilateral HRTF for the direct pathway (solid line) between the        rear left ARG (respectively rear right ARD) loudspeaker and the        left OG (respectively right OD) ear of the listener, and    -   HGSd (respectively HDSg) is the filter corresponding to a        contralateral HRTF for the indirect pathway (dashed line)        between the rear left ARG (respectively rear right ARD)        loudspeaker and the right OD (respectively left OG) ear of the        listener.

A drawback of this technique is its complexity since it requires twobinaural filters per virtual loudspeaker (an ipsilateral HRTF and acontralateral HRTF), therefore ten filters in all in the case of a 5.1format.

The problem is made more acute when these transfer functions need to bemanipulated in the course of various processing procedures such as thoseaccording to the MPEG standard and in particular the processing termed“MPEG Surround”®.

Indeed, with reference to point 6.1 1.4.2.2.2 of the document“Information technology—MPEG audio technologies—Part 1: MPEG Surround”,ISO/IEC JTC 1/SC 29 (21 Jul. 2006), a matrix filtering is provided for,in the domain of the sub-bands m (also denoted κ(k) here), of the type:

${H_{1}^{l,k} = {\begin{bmatrix}h_{11}^{l,k} & h_{12}^{l,k} \\h_{21}^{l,k} & h_{22}^{l,k}\end{bmatrix} = {\begin{bmatrix}h_{L,L}^{l,{\kappa{(k)}}} & h_{L,R}^{l,{\kappa{(k)}}} & h_{L,C}^{l,{\kappa{(k)}}} \\h_{R,L}^{l,{\kappa{(k)}}} & h_{R,R}^{l,{\kappa{(k)}}} & h_{R,C}^{l,{\kappa{(k)}}}\end{bmatrix} \cdot \begin{bmatrix}1 & 0 & 0 & 0 & 0 & 0 \\0 & 1 & 0 & 0 & 0 & 0 \\0 & 0 & 1 & 0 & 0 & 0\end{bmatrix} \cdot w_{temp}^{l,{\kappa{(k)}}}}}},\mspace{79mu}{0 \leq k < K},{0 \leq l < L}$

in order to pass from two monophonic signals to stereophonic signals inbinaural representation.

Indeed, this standard provides for an embodiment in which amulti-channel signal is transported in the form of a stereo mixing(downmix) and of spatialization parameters (denoted CLD for “ChannelLevel Difference”, ICC for “Inter-Channel Coherence”, and CPC for“Channel Prediction Coefficient”). These parameters make it possible ina first step to implement a processing for expanding the stereo mixing(or “downmix”) to three signals L′, R′ and C. In a second step, theyallow the expansion of the signals L′, R′ and C so as to obtain signals5.1 (denoted L, Ls, R, Rs, C and LFE for “Low Frequency Effect”). In thebinaural mode, the signals C and LFE are not separate. The signal C isused for the Binaural downmix processing.

Therefore here, three signals (for respective left L′, right R′ andcenter C′ channels) are firstly constructed on the basis of twomonophonic signals. Thus, the notation W_(temp) ^(l,m); designates aprocessing matrix for expanding stereo signals to these three channels.

The subsequent processing procedures are thereafter:

-   -   a processing for expanding these three channels to N channels in        the multi-channel configuration, for example 5 channels in the        5.1 format, and    -   a processing for spatializing N virtual loudspeakers        respectively associated with these N channels so as to obtain a        binaural or Transaural®, dual-channel representation, with:

h_(L,C) ^(l,m)=P_(L,C) ^(m)·e^(+jφ) ^(C) ^(m) ^(/2), for the path from acentral loudspeaker associated with the aforementioned channel C to theleft ear, h_(R,C) ^(l,m)=P_(R,C) ^(m)·e^(−jφ) ^(C) ^(m) ^(/2), for thepath from the loudspeaker associated with the central C to the rightear,

${h_{L,L}^{l,m} = \sqrt{{( \sigma_{L}^{l,m} )^{2}( P_{L,L}^{m} )^{2}} + {( \sigma_{LS}^{l,m} )^{2}( P_{L,{LS}}^{m} )^{2}}}},$for the ipsilateral paths to the left ear,

${h_{R,L}^{l,m} = {{\mathbb{e}}^{- {j{({{w_{L}^{l,m}\phi_{L}^{m}} + {w_{Ls}^{l,m}\phi_{Ls}^{m}}})}}}\sqrt{{( \sigma_{L}^{l,m} )^{2}( P_{R,L}^{m} )^{2}} + {( \sigma_{Ls}^{l,m} )^{2}( P_{R,{Ls}}^{m} )^{2}}}}},$for the contralateral paths to the left ear,

${h_{L,R}^{l,m} = {{\mathbb{e}}^{j{({{w_{R}^{l,m}\phi_{R}^{m}} + {w_{Rs}^{l,m}\phi_{Rs}^{m}}})}}\sqrt{{( \sigma_{R}^{l,m} )^{2}( P_{L,R}^{m} )^{2}} + {( \sigma_{Rs}^{l,m} )^{2}( P_{L,{Rs}}^{m} )^{2}}}}},$for the contralateral paths to the right ear,

${h_{R,R}^{l,m} = \sqrt{{( \sigma_{R}^{l,m} )^{2}( P_{R,R}^{m} )^{2}} + {( \sigma_{Rs}^{l,m} )^{2}( P_{R,{Rs}}^{m} )^{2}}}},$for the ipsilateral paths to the right ear,

where:

-   -   σ_(L) ^(l,m) and σ_(Ls) ^(l,m) represent relative gains to be        applied to the signal of the channel L′ so as to define channels        L and Ls respectively of the left direct and left ambience        virtual loudspeakers in the 5.1 format, for sample l of        frequency band m in time-frequency transform,    -   σ_(R) ^(l,m) or σ_(Rs) ^(l,m) relative gains to be applied to        the signal of the channel R′ to define channels R and Rs of the        right direct and right ambience virtual loudspeakers in the 5.1        format, for sample l of frequency band m in time-frequency        transform,    -   φ_(L) ^(m), φ_(Ls) ^(m), φ_(R) ^(m) and φ_(Rs) ^(m) are phase        shifts corresponding to interaural delays, and    -   w_(L) ^(l,m), w_(Ls) ^(l,m), w_(R) ^(l,m) and w_(Rs) ^(l,m) are        weightings such that:

$\begin{matrix}{{w_{L}^{l,m} = \frac{( \sigma_{L}^{l,m} )^{2}( P_{R,L}^{m} )^{2}}{{( \sigma_{L}^{l,m} )^{2}( P_{R,L}^{m} )^{2}} + {( \sigma_{Ls}^{l,m} )^{2}( P_{R,{Ls}}^{m} )^{2}}}},{w_{Ls}^{l,m} = \frac{( \sigma_{Ls}^{l,m} )^{2}( P_{R,{Ls}}^{m} )^{2}}{{( \sigma_{L}^{l,m} )^{2}( P_{R,L}^{m} )^{2}} + {( \sigma_{Ls}^{l,m} )^{2}( P_{R,{Ls}}^{m} )^{2}}}},{w_{R}^{l,m} = \frac{( \sigma_{R}^{l,m} )^{2}( P_{L,R}^{m} )^{2}}{{( \sigma_{R}^{l,m} )^{2}( P_{L,R}^{m} )^{2}} + {( \sigma_{Rs}^{l,m} )^{2}( P_{L,{Rs}}^{m} )^{2}}}},{w_{Rs}^{l,m} = {\frac{( \sigma_{Rs}^{l,m} )^{2}( P_{L,{Rs}}^{m} )^{2}}{{( \sigma_{R}^{l,m} )^{2}( P_{L,R}^{m} )^{2}} + {( \sigma_{Rs}^{l,m} )^{2}( P_{L,{Rs}}^{m} )^{2}}}.}}} & \;\end{matrix}$

The following in particular will be adopted:

-   -   P_(L,C) ^(m) is the expression for the spectrum of the transfer        function of HRTF type for a path between a central loudspeaker        in the 5.1 format and the left ear of a listener,    -   P_(R,C) ^(m) is the expression for the spectrum of the transfer        function of HRTF type for a path between a central loudspeaker        in the 5.1 format and the right ear of a listener,    -   P_(L,Ls) ^(m) is the expression for the spectrum of the HRTF for        a path between a left ambience loudspeaker in the 5.1 format and        the left ear,    -   P_(R,Ls) ^(m) is the expression for the spectrum of the HRTF for        a path between a left ambience loudspeaker in the 5.1 format and        the right ear,    -   P_(L,Rs) ^(m) is the expression for the spectrum of the HRTF for        a path between a right ambience loudspeaker in the 5.1 format        and the left ear,    -   P_(R,Rs) ^(m) is the expression for the spectrum of the HRTF for        a path between a right ambience loudspeaker in the 5.1 format        and the right ear,    -   P_(L,R) ^(m) is the expression for the spectrum of the HRTF for        a path between a right loudspeaker in the 5.1 format and the        left ear, and    -   P_(R,R) ^(m) is the expression for the spectrum of the HRTF for        a path between a right loudspeaker in the 5.1 format and the        right ear,    -   P_(L,L) ^(m) is the expression for the spectrum of the HRTF for        a path between a left loudspeaker in the 5.1 format and the left        ear, and    -   P_(R,L) ^(m) is the expression for the spectrum of the HRTF for        a path between a left loudspeaker in the 5.1 format and the        right ear.

In this example, there are thus ten filters associated with theaforementioned HRTF transfer functions for passing from the 5.1 formatto a binaural representation. Hence the complexity problem posed by thistechnique, requiring two binaural filters per virtual loudspeaker (anipsilateral HRTF and a contralateral HRTF).

SUMMARY

The present invention aims to improve the situation.

For this purpose, it proposes firstly a method for processing sound dataencoded in a sub-band domain, for dual-channel playback of binaural orTransaural® type, in which a matrix filtering is applied so as to passfrom a sound representation with N channels with N>0, to a dual-channelrepresentation, this sound representation with N channels consisting inconsidering N virtual loudspeakers surrounding the head of a listener,and, for each virtual loudspeaker of at least some of the loudspeakers:

-   -   a first transfer function specific to an ipsilateral path from        the loudspeaker to a first ear of the listener, facing the        loudspeaker, and    -   a second transfer function specific to a contralateral path from        said loudspeaker to the second ear of the listener, masked from        the loudspeaker by the listener's head.

Advantageously, the matrix filtering applied comprises a multiplicativecoefficient defined by the spectrum, in the sub-band domain, of thesecond transfer function deconvolved with the first transfer function.

A first advantage which ensues from such a construction is thesignificant reduction in the complexity of the processing procedures.Already, as will be seen in detail further on, the transfer functions ofthe central virtual loudspeaker no longer need to be taken into account.Thus, it is not necessary to take into account the transfer functions ofall the virtual loudspeakers, but of only some of the virtualloudspeakers.

Another simplification which ensues from the construction within themeaning of the invention is that it is no longer necessary to providefor a transfer function for the ipsilateral paths. For example, in thecase of a matrix filtering to pass from a sound representation with Mchannels, with M>0, to a dual-channel representation (binaural ortransaural), by passing through an intermediate representation on the Nchannels, with N>2, as in the case of the standard describedhereinabove, the coefficients of the matrix are expressed, for acontralateral path, in particular as a function of respectivespatialization gains of the M channels on the N virtual loudspeakerssituated in a hemisphere around a first ear, and of the spectra of thecontralateral transfer function, relating to the second ear of thelistener, deconvolved with the ipsilateral transfer function, relatingto the first ear. However, in an advantageous manner, for an ipsilateralpath, the coefficients of the matrix are no longer expressed as afunction of the spectra of HRTFs but simply as a function ofspatialization gains of the M channels on the N virtual loudspeakerssituated in a hemisphere around a first ear.

Thus, if the representation with N channels comprises, per hemispherearound an ear, at least one direct virtual loudspeaker and one ambiencevirtual loudspeaker as in “virtual surround”, the coefficients of thematrix being expressed, in a sub-band domain as time-frequency transform(for example of “PQMF” type for “Pseudo-Quadrature Mirror Filters”), by:h _(L,C) ^(l,m) =g(1+P _(L,R) ^(m) ·e ^(−jφ) ^(R) ^(m) )h _(R,C) ^(l,m) =g(1+P _(R,L) ^(m) ·e ^(−jφ) ^(L) ^(m) )

If the HRTF functions are symmetric we have h_(L,C) ^(l,m)=h_(R,C)^(l,m)

${h_{L,R}^{l,m} = {{\mathbb{e}}^{j{({{w_{R}^{l,m}\phi_{R}^{m}} + {w_{Rs}^{l,m}\phi_{Rs}^{m}}})}}\sqrt{{( \sigma_{R}^{l,m} )^{2}( P_{L,R}^{m} )^{2}} + {( \sigma_{Rs}^{l,m} )^{2}( P_{L,{Rs}}^{m} )^{2}}}}},$for the contralateral paths to the left ear;

${h_{R,L}^{l,m} = {{\mathbb{e}}^{- {j{({{w_{L}^{l,m}\phi_{L}^{m}} + {w_{Ls}^{l,m}\phi_{Ls}^{m}}})}}}\sqrt{{( \sigma_{L}^{l,m} )^{2}( P_{R,L}^{m} )^{2}} + {( \sigma_{Ls}^{l,m} )^{2}( P_{R,{Ls}}^{m} )^{2}}}}},$for the contralateral paths to the right ear;

-   -   h_(L,L) ^(l,m)=√{square root over ((σ_(L) ^(l,m))²+(σ_(Ls)        ^(lm))²)}{square root over ((σ_(L) ^(l,m))²+(σ_(Ls) ^(lm))²)}        only, for the ipsilateral paths to the left ear;    -   h_(R,R) ^(l,m)=√{square root over ((σ_(R) ^(l,m))²+(σ_(Rs)        ^(lm))²)}{square root over ((σ_(R) ^(l,m))²+(σ_(Rs) ^(lm))²)}        only, for the ipsilateral paths to the right ear,

where:

-   -   σ_(L) ^(l,m) and σ_(Ls) ^(l,m) represent relative gains to be        applied to one and the same first signal (for example the signal        of the channel L′ in an initial configuration with three        channels, as described hereinabove) so as to define channels L        and Ls respectively of the left direct and left ambience virtual        loudspeakers, for sample l of frequency band m in time-frequency        transform,    -   σ_(R) ^(l,m) or σ_(Rs) ^(l,m) represent relative gains to be        applied to one and the same second signal (for example the        channel R′) so as to define channels R and Rs of the right        direct and right ambience virtual loudspeakers, for sample l of        frequency band m in time-frequency transform,    -   P_(R,L) ^(m) or P_(R,Ls) ^(m) is the expression for the spectrum        of the transfer function of contralateral HRTF type, relating to        the right ear of the listener, deconvolved with an ipsilateral        transfer function, relating to the left ear, for a direct or        respectively ambience, left virtual loudspeaker,    -   P_(L,R) ^(m) or P_(L,Rs) ^(m) is the expression for the spectrum        of the transfer function of contralateral HRTF type, relating to        the left ear of the listener, deconvolved with an ipsilateral        transfer function, relating to the right ear, for a direct or        respectively ambience, right virtual loudspeaker,    -   φ_(L) ^(m), φ_(Ls) ^(m), φ_(R) ^(m) and φ_(Rs) ^(m) are phase        shifts between contralateral and ipsilateral transfer functions        corresponding to chosen interaural delays, and    -   w_(L) ^(l,m), w_(Ls) ^(l,m), w_(R) ^(l,m) and w_(Rs) ^(l,m) are        chosen weightings.

Typically, the coefficient g can have an advantageous value of 0.707(corresponding to the root of ½, when provision is made for an energyapportionment of half of the signal of the central loudspeaker on thelateral loudspeakers), as advocated in the “Downmix ITU” processing.

More precisely, through the implementation of the invention, the matrixfiltering is expressed according to a product of matrices of type:

$\begin{matrix}{H_{1}^{l,k} = \begin{bmatrix}h_{11}^{l,k} & h_{12}^{l,k} \\h_{21}^{l,k} & h_{22}^{l,k}\end{bmatrix}} \\{{= {\begin{bmatrix}h_{L,L}^{l,{\kappa{(k)}}} & h_{L,R}^{l,{\kappa{(k)}}} & h_{L,C}^{l,{\kappa{(k)}}} \\h_{R,L}^{l,{\kappa{(k)}}} & h_{R,R}^{l,{\kappa{(k)}}} & h_{R,C}^{l,{\kappa{(k)}}}\end{bmatrix} \cdot \begin{bmatrix}1 & 0 & 0 & 0 & 0 & 0 \\0 & 1 & 0 & 0 & 0 & 0 \\0 & 0 & 1 & 0 & 0 & 0\end{bmatrix} \cdot W_{temp}^{l,{\kappa{(k)}}}}},}\end{matrix}$ 0 ≤ k < K, 0 ≤ l < L,

where:

-   -   W^(l,m) represents the processing matrix for expanding stereo        signals to M′ channels, with M′>2 (for example M′=3), and

$\begin{bmatrix}h_{L,L}^{l,{\kappa{(k)}}} & h_{L,R}^{l,{\kappa{(k)}}} & h_{L,C}^{l,{\kappa{(k)}}} \\h_{R,L}^{l,{\kappa{(k)}}} & h_{R,R}^{l,{\kappa{(k)}}} & h_{R,C}^{l,{\kappa{(k)}}}\end{bmatrix} \cdot \begin{bmatrix}1 & 0 & 0 & 0 & 0 & 0 \\0 & 1 & 0 & 0 & 0 & 0 \\0 & 0 & 1 & 0 & 0 & 0\end{bmatrix}$represents a global matrix processing comprising:

-   -   a processing for expanding M′ channels to the N channels, with        N>3 (for example 5, for a 5.1 format), and    -   a processing for spatializing the N virtual loudspeakers        respectively associated with the N channels so as to obtain a        binaural or Transaural®, dual-channel representation.

Another drawback of the “Binaural downmix” method within the meaning ofthe prior art is that it does not retain the timbre of the initialsound, which is played back well by the “Downmix” processing, since thefilters of the binaural processing resulting from the HRTFs greatlymodify the spectrum of the signals and thus achieve “coloration” effectsby comparison with “Downmix”. Moreover, the great majority of usersprefer “Downmix” even if “Binaural downmix” actually affords anextra-cranial spatial perception of sounds. The drawback of theimpairment of timbre (or “coloration”) afforded by “Binaural Downmix” isnot compensated for by the affording of spatialization effects,according to the feeling of users.

Here again, the construction within the meaning of the present inventionaims to improve the situation. The implementation of the invention suchas described hereinabove makes it possible to safeguard the perceivedtimbre of the sound sources from any distortion.

Indeed, the filtering of the contralateral component, defined by thecontralateral transfer function deconvolved with the ipsilateraltransfer function, makes it possible to reduce the distortion of timbreafforded by the binauralization processing. As will be seen further on,such a filtering amounts to a low-pass filtering delayed by a valuecorresponding to the interaural delay. It is advantageously possible tochoose a cutoff frequency of the low-pass filter for all the HRTF pairsat about 500 Hz, with a very sizable filter slope. The brain perceives,on one ear, the original signal (without processing) and, on the otherear, the delayed and low-pass-filtered signal. Beyond the cutofffrequency, the perceived difference in level with respect to dioticlistening to the original signal attenuated by 6dB is tiny. On the otherhand, under the cutoff frequency, the signal is perceived twice asstrongly. For the signals containing frequencies under the cutofffrequency, the difference in timbre will therefore consist of anamplification of the low frequencies.

Such impairment of timbre can advantageously be eliminated simply byhigh-pass filtering, which may be the same for all the HRTF transferfunctions (directions of loudspeakers). In the case of a processing forbinaural playback, the aforementioned impairment of timbre canadvantageously be applied to the binaural stereo signal resulting fromthe sub-mixing. Furthermore, to avoid a difference in loudness betweenthe results of a processing of “Downmix” type and a binauralizationprocessing within the meaning of the invention, provision mayfurthermore advantageously be made for an automatic gain control at theend of the processing, so as to contrive matters such that the levelsthat would be delivered by the Downmix processing and thebinauralization processing within the meaning of the invention aresimilar. For this purpose, as will be seen in detail further on, ahigh-pass filter and an automatic gain control are provided at the endof the processing chain.

Thus, in more generic terms, a chosen gain is furthermore applied to twosignals, left track and right track, in a dual-channel representation(binaural or Transaural®), before playback, the chosen gain beingcontrolled so as to limit an energy of the left track and right tracksignals, to the maximum, to an energy of signals of the virtualloudspeakers. In a practical implementation, an automatic gain controlis preferably applied to the two signals, left track and right track,downstream of the application of the frequency-variable weightingfactor.

Furthermore, advantage is taken of the processing within the meaning ofthe invention so as to eliminate the distortion of coloration affordedby the customary binauralization processing. It is indeed apparent thatthe coloration distortion reduction processing is very simple to carryout when it is implemented in the transformed domain of the sub-bands.Indeed, the equations hereinabove giving the coefficients of matricesbecome simply:h _(L,C) ^(l,m) =g(1+P _(L,R) ^(m) ·e ^(−jφ) ^(R) ^(m) )*Gainh _(R,C) ^(l,m) =g(1+P _(R,L) ^(m) ·e ^(−jφ) ^(L) ^(m) )*Gainh _(L,L) ^(l,m)=√{square root over ((σ_(L) ^(l,m))²+(σ_(Ls)^(lm))²)}{square root over ((σ_(L) ^(l,m))²+(σ_(Ls) ^(lm))²)}*Gain

$h_{R,L}^{l,m} = {{\mathbb{e}}^{- {j{({{w_{L}^{l,m}\phi_{L}^{m}} + {w_{Ls}^{l,m}\phi_{Ls}^{m}}})}}}\sqrt{{( \sigma_{L}^{l,m} )^{2}( P_{R,L}^{m} )^{2}} + {( \sigma_{Ls}^{l,m} )^{2}( P_{R,{Ls}}^{m} )^{2}}}*{Gain}}$$h_{L,R}^{l,m} = {{\mathbb{e}}^{j{({{w_{R}^{l,m}\phi_{R}^{m}} + {w_{Rs}^{l,m}\phi_{Rs}^{m}}})}}\sqrt{{( \sigma_{R}^{l,m} )^{2}( P_{L,R}^{m} )^{2}} + {( \sigma_{Rs}^{l,m} )^{2}( P_{L,{Rs}}^{m} )^{2}}}*{Gain}}$h _(R,R) ^(l,m)=√{square root over ((σ_(R) ^(l,m))²+(σ_(Rs)^(lm))²)}{square root over ((σ_(R) ^(l,m))²+(σ_(Rs) ^(lm))²)}*Gain

The “Gain” weighting in the equations hereinabove being such that, in anexemplary embodiment:

Gain=0.5 if the frequency band of index m is such that m<9 (or if thefrequency f is itself less than 500 Hz) and

Gain=1, otherwise.

Thus, in more generic terms, the coefficients of the aforementionedmatrix involved in the matrix filtering vary as a function of frequency,according to a weighting of a chosen factor (Gain) less than one, if thefrequency is less than a chosen threshold, and of one otherwise. In theexemplary embodiment given hereinabove, the factor is about 0.5 and thechosen frequency threshold is about 500 Hz so as to eliminate acoloration distortion.

It is possible also to apply this gain directly at the processingoutput, in particular to the output signals before playback onloudspeakers or earpieces, by applying to the equations:

${y_{B}^{n,k} = {\begin{bmatrix}y_{L_{B}}^{n,k} \\y_{R_{B}}^{n,k}\end{bmatrix} = {\begin{bmatrix}h_{11}^{n,k} & h_{12}^{n,k} \\h_{21}^{n,k} & h_{22}^{n,k}\end{bmatrix}\begin{bmatrix}y_{L_{0}}^{n,k} \\y_{R_{0}}^{n,k}\end{bmatrix}}}},{0 \leq k < K}$

the aforementioned gain, as follows:

$y_{B}^{n,k} = {{\begin{bmatrix}{y_{L_{B}}^{n,k}*{Gain}} \\{y_{R_{B}}^{n,k}*{Gain}}\end{bmatrix}\mspace{14mu} 0} \leq k < K}$

The “Gain” weighting and the automatic gain control can also beintegrated into one and the same processing, as follows:

${Gain} = {0.5*\sqrt{\frac{\sum\limits_{k}( {{( y_{L_{0}}^{n,k} )( y_{L_{0}}^{n,k} )^{*}} + {( y_{R_{0}}^{n,k} )( y_{R_{0}}^{n,k} )^{*}}} )}{\sum\limits_{k}( {{( y_{L_{B}}^{n,k} )( y_{L_{B}}^{n,k} )^{*}} + {( y_{R_{B}}^{n,k} )( y_{R_{B}}^{n,k} )^{*}}} )}}}$

if the frequency band of index m is such that m<9 (or if the frequency fis itself less than 500 Hz) and

${{Gain} = \sqrt{\frac{\sum\limits_{k}( {{( y_{L_{0}}^{n,k} )( y_{L_{0}}^{n,k} )^{*}} + {( y_{R_{0}}^{n,k} )( y_{R_{0}}^{n,k} )^{*}}} )}{\sum\limits_{k}( {{( y_{L_{B}}^{n,k} )( y_{L_{B}}^{n,k} )^{*}} + {( y_{R_{B}}^{n,k} )( y_{R_{B}}^{n,k} )^{*}}} )}}},{{otherwise}.}$

Another advantage afforded by the invention is the transport of theencoded signal and its processing with a decoder so as to improve itssound quality, for example a decoder of MPEG Surround® type.

In the context of the invention where no transfer function is appliedfor the direct paths (ipsilateral contributions) and an additionalprocessing is provided for on the indirect paths (spectrum of thecontralateral transfer function deconvolved with the ipsilateraltransfer function), it is interesting to note that by applying a gain of0.707 to the signals of the central and ambience (rear left and rearright) channels, then the unprocessed part of the stereo sub-mixing (theipsilateral contributions) exhibits the same form as the result of aprocessing of Downmix ITU type. It is possible to generalize theforegoing to any type of sub-mixing processing (Downmix). Indeed, aDownmix processing to two channels generally consists in applying aweighting to the channels (of the virtual loudspeakers), and then insumming the N channels to two output signals. Applying a binauralspatialization processing to the Downmix processing consists in applyingto the N weighted channels the HRTF filters corresponding to thepositions of the N virtual loudspeakers. As these filters are equal to 1for the ipsilateral contributions, the Downmix processing is indeedretrieved by applying the sum of the ipsilateral contributions.

Therefore, the signals obtained by a binauralization processing withinthe meaning of the invention arise from a sum of signals of Downmix typeand a stereo signal comprising the location indices required by thebrain in order to perceive the spatialization of the sounds. This secondsignal is called “Additional Binaural Downmix” hereinafter, so that theprocessing within the meaning of the invention, called “BinauralDownmix” here, is such that:“Binaural Downmix”=“Downmix”+“Additional Binaural Downmix”.

The latter equation may be generalized to:“Binaural Downmix”=“Downmix”+α“Additional Binaural Downmix”

In this equation, α may be a coefficient lying between 0 and 1. Forexample, a listener user can choose the level of the coefficient αbetween 0 and 1, continually or by toggling between 0 and 1 (in “ON-OFF”mode). Thus, it is possible to choose a weighting α of the secondprocessing “Additional Binaural Downmix” in the global processing usingthe matrix filtering within the meaning of the invention.

It is also possible to consider the weighting α in this equation as aquantization function, for example based on energy thresholding of theresult of the ABD (for “Additional Binaural Downmix”) processing (withfor example, α=0 if the result of the ABD processing exhibits, in agiven spectral band, an energy below a threshold, and α=1, otherwise,for this same spectral band). This embodiment exhibits the advantage ofrequiring only a small passband for the transmission of the results ofthe Downmix and ABD processing procedures, from a coder to a decoder asrepresented in FIG. 7 described further on, demanding bitrate only ifthe result of the ABD processing is significant with respect to theresult of the Downmix. Of course, provision may be made for variousthresholds with for example α=0; 0.25; 0.5; 0.75; 1.

This additional signal requires only little bitrate to transport it.Indeed, it takes the form of a residual, low-pass-filtered signal whichtherefore a priori has much less energy than the Downmix signal.Furthermore, it exhibits redundancies with the Downmix signal. Thisproperty may be advantageously utilized jointly with codecs of DolbySurround, Dolby Prologic or MPEG Surround type.

The “Additional Binaural Downmix” signal can then be compressed andtransported in an additional and/or scalable manner with the Downmixsignal, with little bitrate. During headset listening, the addition ofthe two stereo signals allows the listener to profit fully from thebinaural signal with a quality that is very similar to a 5.1 format.

Thus, it suffices to decode the “Additional Binaural Downmix” signal andto add it directly to the Downmix signal. Provision may be made toembody a scalable coder, transporting for example by default a stereosignal without binauralization effect, and, if the bitrate so allows,furthermore transporting an additional-signal over-layer for thebinauralization.

In the case of the MPEG Surround coder, in which provision is currentlymade, in one of its operational modes, to transport a stereo signal (ofDownmix type) and to carry out the binauralization processing in thecoded (or transformed) domain, reduced complexity and a better qualityof rendition is obtained. In the case of headset rendition, the decodersimply has to calculate the “Additional Binaural Downmix” signal. Thecomplexity is therefore reduced, without any risk of degradation of thesignal of Downmix type. The sound quality thereof can only be improved.

Such characteristics are summarized as follows: the matrix filteringwithin the meaning of the invention consists in applying, in anadvantageous embodiment:

-   -   a first sub-mixing processing of the N channels into two stereo        signals (for example of Downmix type), and    -   a second processing leading, when it is executed jointly with        the first processing, to a spatialization of the N virtual        loudspeakers respectively associated with the N channels so as        to obtain a binaural or Transaural®, dual-channel        representation.

Advantageously, the application of the second processing is decided asan option (for example as a function of the bitrate, of the capabilitiesfor spatialized playback of a terminal, or the like). The aforementionedfirst processing may be applied in a coder communicating with a decoder,while the second processing is advantageously applied at the decoder.

The management of the processing procedures within the meaning of theinvention can advantageously be conducted by a computer programcomprising instructions for the implementation of the method accordingto the invention, when this program is executed by a processor, forexample with a decoder in particular. In this respect, the invention isalso aimed at such a program.

The present invention is also aimed at a module equipped with aprocessor and with a memory, and which is able to execute this computerprogram. A module within the meaning of the invention, for theprocessing of sound data encoded in a sub-band domain, with a view todual-channel playback of binaural or Transaural® type, hence comprisesmeans for applying a matrix filtering so as to pass from a soundrepresentation with N channels with N>0, to a dual-channelrepresentation. The sound representation with N channels consists inconsidering N virtual loudspeakers surrounding the head of a listener,and, for each virtual loudspeaker of at least some of the loudspeakers:

-   -   a first transfer function specific to an ipsilateral path from        the loudspeaker to a first ear of the listener, facing the        loudspeaker, and    -   a second transfer function specific to a contralateral path from        said loudspeaker to the second ear of the listener, masked from        the loudspeaker by the listener's head.

The matrix filtering applied comprises a multiplicative coefficientdefined by the spectrum, in the sub-band domain, of the second transferfunction deconvolved with the first transfer function.

Such a module can advantageously be a decoder of MPEG Surround® type andfurthermore comprise decoding means of MPEG Surround® type, or can, as avariant, be built into such a decoder.

BRIEF DESCRIPTION OF THE DRAWINGS

Other characteristics and advantages of the invention will be apparenton examining the detailed description hereinafter and the appendeddrawings in which:

FIG. 1 schematically represents a playback on two loudspeakers aroundthe head of a listener;

FIG. 2 schematically represents a playback on five loudspeakers in 5.1multi-channel format;

FIG. 3A schematically represents the ipsilateral paths (solid lines) andcontralateral (dashed lines) in 5.1 multi-channel format;

FIG. 3B represents a processing diagram of the prior art for passingfrom a 5.1 multi-channel format illustrated in FIG. 3A to a binaural ortransaural format;

FIG. 4A schematically represents the ipsilateral (solid lines) andcontralateral (dashed lines) paths in 5.1 multi-channel format, withfurthermore the ipsilateral and contralateral paths of the centralloudspeaker;

FIG. 4B represents a processing diagram for passing from a 5.1multi-channel format illustrated in FIG. 4A to a binaural or transauralformat, with four filters only in an embodiment within the meaning ofthe invention;

FIG. 5 illustrates a processing equivalent to the application of one ofthe filters of FIG. 4B;

FIG. 6 illustrates an additional processing of high-pass filtering andautomatic gain control to be applied to the outputs S_(G) and S_(D) toavoid a coloration distortion and a difference of timbre between a“Downmix” processing and a processing within the meaning of theinvention;

FIG. 7 illustrates the situation of a processing within the meaning ofthe invention, carried out with the coder in a possible exemplaryembodiment of the invention, in particular in the case of an additionalABD processing to be combined with the Downmix processing.

DETAILED DESCRIPTION

Reference is made firstly to FIG. 4A to describe an exemplaryimplementation of the processing to pass from a multi-channelrepresentation (5.1 format in the example described) to a binaural orTransaural® stereo dual-channel representation. In this figure, fiveloudspeakers in configuration according to the 5.1 format areillustrated:

-   -   a front loudspeaker C situated facing the listener, in a        mid-plane (plane P of FIG. 2),    -   a left lateral loudspeaker AVG,    -   a right lateral loudspeaker AVD, and    -   a rear left loudspeaker ARG to produce a so-called “surround”        effect,    -   a right rear loudspeaker ARD to also produce a so-called        “surround” effect.

With reference now to FIG. 4B, the playback of the audio content in abinaural or transaural context is intended to be performed on a firsttrack S_(G) and a second track S_(D), this content being initiallyencoded in a multi-channel format (with N channels with N=5 in theexample described) in which each channel is associated with aloudspeaker position with respect to the listener (FIG. 4A).

Advantageously, the channels associated with positions of loudspeakers(for example the loudspeakers AVG and ARG of FIG. 4A) in a firsthemisphere with respect to the listener (that of the left ear OG) aregrouped together and applied directly to the track S_(G) of FIG. 4B. Thechannels associated with the positions of the loudspeakers AVD and ARDin a second hemisphere with respect to the listener (that of his rightear OD) are grouped together and applied directly to the other trackS_(D) of FIG. 4B. It is specified that the first and second hemispheresare separated by the mid-plane of the listener. These components ofsignals AVG, ARG being applied directly to the track S_(G), on the onehand, and the components of signals AVD, ARD being applied directly tothe track S_(D), on the other hand, it will be noted, in the example ofFIG. 4B, that no particular processing is applied to them.

Again with reference to FIG. 4B, the channels AVG and ARG associatedwith positions of the first hemisphere are grouped together and alsoapplied to the second track S_(D), and the channels AVD and ARDassociated with positions of the second hemisphere are grouped togetherand also applied to the first track S_(G). Here, provision is made foran additional processing to be applied:

-   -   to each channel AVG and ARG of the first hemisphere intended for        the second track S_(D), and    -   to each channel AVD and ARD of the second hemisphere intended        for the first track S_(G).

The additional processing preferably comprises the application of afiltering (C/I)_(AVG), (C/I)_(AVD), (C/I)_(ARG), (C/I)_(ARD) (FIG. 4B)defined, in the coded (or transformed) domain, by the spectrum of acontralateral acoustic transfer function deconvolved with an ipsilateraltransfer function. More precisely, the ipsilateral transfer function isassociated with a direct acoustic pathway I_(AVG), I_(AVD), I_(ARG),I_(ARD) (FIG. 4A) between a loudspeaker position and one ear of thelistener and the contralateral transfer function is associated with anacoustic pathway C_(AVG), C_(AVA), C_(ARD), C_(ARD) (FIG. 4A) passingthrough the head of the listener, between the aforementioned loudspeakerposition and the other ear of the listener.

Thus, for each channel associated with a virtual loudspeaker situatedoutside of the mid-plane (therefore all the loudspeakers except thefront loudspeaker), the spatialization of the virtual loudspeaker isensured by a pair of transfer functions, HRTF (expressed in thefrequency domain) or HRIR (expressed in the time domain). These transferfunctions translate the ipsilateral path (direct path between theloudspeaker and the closer ear, solid line in FIG. 4A) and thecontralateral path (path between the loudspeaker and the ear masked bythe listener's head, dashed lines in FIG. 4A).

Rather than use raw transfer functions for each path as in the sense ofthe prior art, the filter associated with the ipsilateral path isadvantageously eliminated and a filter corresponding to thecontralateral transfer function deconvolved with the ipsilateraltransfer function is used for the contralateral path. Thus, for eachvirtual loudspeaker (except for the central loudspeaker C), a singlefilter is used.

Thus, with reference to FIG. 4B:

-   -   the filter referenced (C/I)_(ARG) is defined, in the transformed        domain, by the spectrum of the contralateral transfer function        of the path between the rear left loudspeaker ARG and the right        ear OD deconvolved with the ipsilateral transfer function of the        path between the rear left loudspeaker ARG and the left ear OG        of the individual,    -   the filter referenced (C/I)_(ARD) is defined, in the transformed        domain, by the spectrum of the contralateral transfer function        of the path between the right rear loudspeaker ARD and the left        ear OG deconvolved with the ipsilateral transfer function of the        path between the right rear loudspeaker ARD and the right ear OD        of the individual,    -   the filter referenced (C/I)_(AVG) is defined, in the transformed        domain, by the spectrum of the contralateral transfer function        of the path between the left lateral loudspeaker AVG and the        right ear OD deconvolved with the ipsilateral transfer function        of the path between the left lateral loudspeaker AVG and the        left ear OG of the individual, and    -   the filter referenced (C/I)_(AVD) is defined, in the transformed        domain, by the spectrum of the contralateral transfer function        of the path between the right lateral loudspeaker AVD and the        left ear OG deconvolved with the ipsilateral transfer function        of the path between the right lateral loudspeaker AVD and the        right ear OD of the individual.

Moreover, the signal which, in 5.1 encoding, is intended to feed thecentral loudspeaker C (in the mid-plane of symmetry of the listener'shead), is distributed as two fractions (preferably in a manner equal to50% and 50%) on two tracks which add together on two respective tracksof the left and right lateral loudspeakers. In the same manner, if thereis provision for a rear loudspeaker in the mid-plane, the associatedsignal is mixed with the signals associated with the rear left ARG andrear right ARD loudspeakers. Of course, if there are several centralloudspeakers (front loudspeaker for playback of the middle frequencies,front loudspeaker for playback of the low frequencies, or the like)their signals are added together and again apportioned over the signalsassociated with the lateral loudspeakers.

As the channel associated with a loudspeaker central position C, in themid-plane, is apportioned in a first and a second signal fraction,respectively added to the channel of the loudspeaker AVG in the firsthemisphere (around the left ear OG) and to the channel of theloudspeaker AVD in the second hemisphere (around the right ear OD), itis not necessary to make provision for filterings by the transferfunctions associated with the loudspeakers situated in the mid-plane,this being the case with no change in the perception of thespatialization of the sound scene in binaural or Transaural® playback.

Of course, provision can also be made for a processing for passing froma multi-channel format with N channels, with N still larger than 5 (7.1format or the like) to a binaural format. For this purpose, it suffices,by adding two extra lateral loudspeakers, to provide for the same typesof filters (represented by the contralateral HRTF deconvolved with theipsilateral HRTF) for example for two additional loudspeakers in the 7.1initial format.

The processing complexity is greatly reduced since the filtersassociated with the loudspeakers situated in the mid-plane areeliminated. Another advantage is that the effect of coloration of theassociated signals is reduced.

The spectrum of the contralateral transfer function deconvolved with theipsilateral transfer function may be defined, in the transformed domain,by:

-   -   the gain of the transform of the contralateral transfer function        deconvolved with the ipsilateral transfer function, and    -   the delay defined by the difference of the respective phases of        the contralateral and ipsilateral transfer functions,    -   and optionally as a function of an estimation of coherence        between the left track and the right track, in particular in the        case of a single initial mono source to be spatialized in the        5.1 format and then in the binaural format (this case being        described further on).

As a first approximation, it may simply be considered that the ratio ofthe respective gains of the transforms of the transfer functions, ineach frequency band considered, is close to the gain of the transform ofthe contralateral transfer function deconvolved with the ipsilateraltransfer function. The gains of the transforms of the contralateral andipsilateral transfer functions, as well as their phases, in eachspectral band, are given for example in annex C of the aforementionedstandard “Information technology—MPEG audio technologies—Part 1: MPEGSurround”, ISO/IEC JTC 1/SC 29 (21 Jul. 2006), for a PQMF transform in64 sub-bands.

Thus, as a first approximation, for a contralateral path and in a givenspectral band m, the spectrum of the contralateral transfer functiondeconvolved with the ipsilateral transfer function may be defined, inthe transformed domain, by:

${P_{R,L}^{m} = {\frac{G_{R,L}^{m}}{G_{L,L}^{m}}\exp\;{j( {\Phi_{R,L}^{m} - \Phi_{L,L}^{m}} )}}},$G_(R,L) ^(m) and Φ_(R,L) ^(m) being the gain and the phase of thecontralateral transfer function and G_(L,L) ^(m) and Φ_(L,L) ^(m) beingthe gain and the phase of the ipsilateral transfer function.

With reference to FIG. 5, each filter is equivalent to applying:

-   -   an equalizer filtering 11, preferably of low-pass type,    -   advantageously an interaural delay (or “ITD”) 10, to take        account of the path differences between a virtual source and        each ear, and    -   optionally an attenuation 12 with respect to the unfiltered        components of signals (for example the component AVG on the        track S_(G) of FIG. 4B).

It is appropriate to indicate here that the delay ITD applied is“substantially” interaural, the term “substantially” referring inparticular to the fact that rigorous account may not be taken of thestrict morphology of the listener (for example if HRTFs are used bydefault, in particular HRTFs termed “Kemar's head”).

Thus, the binaural synthesis of a virtual loudspeaker (AVG for example)consists simply in playing without modification the input signal on theipsilateral relative track (track S_(G) in FIG. 4B) and applying to thesignal to be played on the contralateral track (track S_(D) in FIG. 4B)a corresponding filter (C/I)_(AVG) as the application of a delay, of anattenuation and of a low-pass filtering. Thus, the resulting signal isdelayed, attenuated and filtered by eliminating the high frequencies,this being manifested, from the point of view of auditory perception, bya masking of the signal received by the “contralateral” ear (OD, in theexample where the virtual loudspeaker is the left lateral AVG), inrelation to the signal received by the “ipsilateral” ear (OG).

The coloration which may be perceived is therefore directly that of thesignal received by the ipsilateral ear. Now, in an advantageous manner,this signal does not undergo any transformation and, consequently, theprocessing within the meaning of the invention ought to afford only weakcoloration. However, by way of complementary precaution, with referenceto FIG. 6, provision may be made for a processing of the output signalsS_(G) and S_(D) of FIG. 4B consisting in applying a high-pass filterFPH, followed by an automatic gain control CAG.

The high-pass filter amounts to applying the “Gain” factor describedhereinabove, with:

-   -   Gain=0.5 if the frequency f is less than 500 Hz and    -   Gain=1 otherwise.

Advantageously, in this embodiment, this factor is applied globally atoutput of the signals S_(G) and S_(D), as a variant of an individualapplication to each coefficient of the matrix

$\begin{bmatrix}h_{L,L}^{l,{\kappa{(k)}}} & h_{L,R}^{l,{\kappa{(k)}}} & h_{L,C}^{l,{\kappa{(k)}}} \\h_{R,L}^{l,{\kappa{(k)}}} & h_{R,R}^{l,{\kappa{(k)}}} & h_{R,C}^{l,{\kappa{(k)}}}\end{bmatrix}\quad$explained further on.

Advantageously, the automatic gain control is tied to the globalintensity of the signals corresponding to the Downmix processing, givenby:I_(D)=√{square root over (I_(AVG) ²+I_(AVD) ²+g_(s) ²I_(ARG) ²+g_(s)²I_(ARD) ²+g²I_(C) ²)},

whereI_(AVG) ²,I_(AVD) ²,I_(ARG) ²,I_(ARD) ²,I_(C) ²

are the respective energies of the signals of the front left, frontright, rear left, rear right and center channels of a 5.1 format. Thegains g and g_(s) are applied globally to the signal C for the gain gand to the signals ARG and ARD for the gain g_(s). Stated otherwise, theenergy of the left track signals S′_(G) and right track signals S′_(D)is thereby limited on completion of this processing, to the maximum, tothe global energy I_(D) ² of the signals of the virtual loudspeakers.The signals recovered S′_(G) and S′_(D) may ultimately be conveyed to adevice for sound playback, in binaural stereophonic mode.

In practice, in a coder in particular of MPEG Surround type, the globalintensity of the signals is customarily calculated directly on the basisof the energy of the input signals. Thus, in a variant this datum willbe taken into account in estimating the intensity I_(D).

The implementation of the invention then results in elimination of themonaural location indices. Now, the more a source deviates from themid-plane, the more predominant the interaural indices become, to thedetriment of the monaural indices. Having regard to the fact that inrecommendation ITU-R BS.775 relating to the disposition of theloudspeakers of the 5.1 system, the angle between the lateralloudspeakers (or between the rear loudspeakers) is greater than 60°, theelimination of the monaural indices has only little influence on theperceived position of the virtual loudspeakers. Moreover, the differenceperceived here is less than the difference that could be perceived bythe listener due to the fact that the HRTFs used were not specific tohim (for example, models of HRTFs derived from the so-called “Kemarhead” technique).

Thus, the spatial perception of the signal is kept, doing so withoutaffording coloration and while preserving the timbre of the soundsources.

Further still, the solution within the meaning of the present inventionsubstantially halves the number of filters to be provided andfurthermore corrects the coloration effects.

Moreover, it has been observed that the choice of the position of thevirtual loudspeakers can appreciably influence the quality of the resultof the spatialization. Indeed, it has turned out to be preferable toplace the lateral and rear virtual loudspeakers at +/−45° with respectto the mid-plane, rather than at +/−30° to the mid-plane according tothe configuration recommended by the International TelecommunicationsUnion (ITU). Indeed, when the virtual loudspeakers approach themid-plane, the ipsilateral and contralateral HRTF functions tend toresemble one another and the previous simplifications may no longer givesatisfactory spatialization.

Thus, in generic terms, by considering an initial multi-channel formatdefining at least four positions:

-   -   of two lateral loudspeakers, symmetric with respect to the        mid-plane, and    -   of two rear loudspeakers, symmetric with respect to the        mid-plane,

the position of a lateral loudspeaker is advantageously included in anangular sector of 10° to 90° and preferably of 30 to 60° from a symmetryplane P and facing the listener's face. More particularly, the positionof a lateral loudspeaker will preferably be close to 45° from thesymmetry plane.

FIG. 7 is now referred to in order to describe a possible embodiment ofthe invention in which the processing within the meaning of theinvention intervenes after the step of coding the sound data, forexample before transmission to a decoder 74 via a network 73. Here, aprocessing module within the meaning of the invention 72 intervenesdirectly downstream of a coder 71, so as to deliver, as indicatedpreviously, data processed according to a processing of the type:

-   -   Downmix+αABD (with ABD for “Additional Binaural Downmix”).

A possible embodiment of such a processing is described hereinafter.

Starting from a 5.0 signal (L, R, C, Ls, Rs) to be coded andtransported, we thus consider a global Downmix processing of the type:

$\begin{bmatrix}L_{0}^{l,m} \\R_{0}^{l,m}\end{bmatrix} = \begin{bmatrix}{L^{l,m} + {g*C^{l,m}} + L_{s}^{l,m}} \\{R^{l,m} + {g*C^{l,m}} + R_{s}^{l,m}}\end{bmatrix}$

The signals L₀ ^(l,m) and R₀ ^(l,m) therefore correspond to the twostereo signals, without spatialization effect, that could be deliveredby a decoder so as to feed two loudspeakers in sound playback.

The calculation of the Downmix processing, without binauralizationfiltering, ought therefore to make it possible to retrieve these twosignals L₀ ^(l,m) and R₀ ^(l,m), this then being expressed for exampleas follows:{tilde over (L)} ₀ ^(l,m) ={tilde over (L)} ^(l,m) +g{tilde over (C)}^(l,m) +{tilde over (L)} _(s) ^(l,m){tilde over (R)} ₀ ^(l,m) ={tilde over (R)} ^(l,m) +g{tilde over (C)}^(l,m) +{tilde over (R)} _(s) ^(l,m)

By now applying a binaural filtering and by apportioning the signal ofthe central loudspeaker over the channels L and R in an equal mannerwith the gain g, we obtain:

${\overset{\sim}{L}}_{B}^{l,m} = {{( {{\overset{\sim}{L}}^{l,m} + {g{\overset{\sim}{C}}^{l,m}}} )P_{L,L}^{m}} + {( {{\overset{\sim}{R}}^{l,m} + {g\;{\overset{\sim}{C}}^{l,m}}} ){P_{L,R}^{m} \cdot {\mathbb{e}}^{{- j}\;\phi_{R}^{m}}}} + {{\overset{\sim}{L}}_{s}^{l,m}P_{L,L_{s}}^{m}} + {{\overset{\sim}{R}}_{s}^{l,m}{P_{{L,R_{s}}\;}^{m} \cdot {\mathbb{e}}^{{- {j\phi}_{R_{s}}^{m}}\;}}}}$${\overset{\sim}{R}}_{B}^{l,m} = {{( {{\overset{\sim}{R}}^{l,m} + {g\;{\overset{\sim}{C}}^{l,m}}} )P_{R,R}^{m}} + {( {{\overset{\sim}{L}}^{l,m} + {g\;{\overset{\sim}{C}}^{l,m}}} ){P_{R,L}^{m} \cdot {\mathbb{e}}^{{- j}\;\phi_{L}^{,m}}}} + {{\overset{\sim}{R}}_{s}^{l,m}P_{R,R_{s}}^{m}} + {{\overset{\sim}{L}}_{s}^{l,m}{P_{R,L_{s}}^{m} \cdot {\mathbb{e}}^{{- j}\;\phi_{L_{s}\;}^{m}}}}}$

If the contralateral HRTF functions deconvolved with the ipsilateralHRTF functions are used for the contralateral filtering, we have P_(L,L)^(m)=P_(R,R) ^(m)=P_(L,L) _(s) ^(m)=P_(R,R) _(s) ^(m)=1, and

${\overset{\sim}{L}}_{B}^{l,m} = {( {{\overset{\sim}{L}}^{l,m} + {g\;{\overset{\sim}{C}}^{l,m}} + {\overset{\sim}{L}}_{s}^{l,m}} ) + {( {{\overset{\sim}{R}}^{\;{l,m}} + {g\;{\overset{\sim}{C}}^{l,m}}} ){P_{L,R}^{m} \cdot {\mathbb{e}}^{{- j}\;\phi_{R}^{m}}}} + {{\overset{\sim}{R}}_{s}^{l,m}{P_{L,R_{s}}^{m} \cdot {\mathbb{e}}^{{- j}\;\phi_{R_{s}}^{m}}}}}$${\overset{\sim}{R}}_{B}^{l,m} = {( {{\overset{\sim}{R}}^{l,m} + {g\;{\overset{\sim}{C}}^{l,m}} + {\overset{\sim}{R}}_{s}^{l,m}} ) + {( {{\overset{\sim}{L}}^{l,m} + {g\;{\overset{\sim}{C}}^{l,m}}} ){P_{R,L}^{m} \cdot {\mathbb{e}}^{{- j}\;\phi_{L}^{m}}}} + {{\overset{\sim}{L}}_{s}^{l,m}{P_{R,L_{s}}^{m} \cdot {\mathbb{e}}^{{- j}\;\phi_{L_{s}}^{m}}}}}$

and therefore:

${\overset{\sim}{L}}_{B}^{l,m} = {{\overset{\sim}{L}}_{0}^{l,m} + {( {{\overset{\sim}{R}}^{l,m} + {g\;{\overset{\sim}{C}}^{l,m}}} ){P_{L,R}^{m} \cdot {\mathbb{e}}^{- {j\phi}_{R}^{m}}}} + {{\overset{\sim}{R}}_{s}^{l,m}{P_{L,R_{s}}^{m} \cdot {\mathbb{e}}^{{- j}\;\phi_{R_{s}}^{m}}}}}$${\overset{\sim}{R}}_{B}^{l,m} = {{\overset{\sim}{R}}_{0}^{l,m} + {( {{\overset{\sim}{L}}^{l,m} + {g\;{\overset{\sim}{C}}^{l,m}}} ){P_{R,L}^{m} \cdot {\mathbb{e}}^{- {j\phi}_{L}^{m}}}} + {{\overset{\sim}{L}}_{s}^{l,m}{P_{R,L_{s}}^{m} \cdot {\mathbb{e}}^{{- j}\;\phi_{L_{s}}^{m}}}}}$

The additional binaural Downmix may be written:

${\overset{\sim}{L}}_{DBA}^{l,m} = {{( {{\overset{\sim}{R}}^{l,m} + {g\;{\overset{\sim}{C}}^{l,m}}} ){P_{L,R}^{m} \cdot {\mathbb{e}}^{- {j\phi}_{R}^{m}}}} + {{\overset{\sim}{R}}_{s}^{l,m}{P_{L,R_{s}}^{m} \cdot {\mathbb{e}}^{{- j}\;\phi_{R_{s}}^{m}}}}}$${\overset{\sim}{R}}_{DBA}^{l,m} = {{( {{\overset{\sim}{L}}^{l,m} + {g\;{\overset{\sim}{C}}^{l,m}}} ){P_{R,L}^{m} \cdot {\mathbb{e}}^{- {j\phi}_{L}^{m}}}} + {{\overset{\sim}{L}}_{s}^{l,m}{P_{R,L_{s}}^{m} \cdot {\mathbb{e}}^{{- j}\;\phi_{L_{s}}^{m}}}}}$

Returning to the example of a matrix filtering expressed according to aproduct of matrices of type:

${H_{1}^{l,m} = {\begin{bmatrix}h_{L,L}^{l,m} & h_{L,R}^{l,m} & h_{L,C}^{l,m} \\h_{R,L}^{l,m} & h_{R,R}^{l,m} & h_{R,C}^{l,m}\end{bmatrix} \cdot \begin{bmatrix}1 & 0 & 0 & 0 & 0 & 0 \\0 & 1 & 0 & 0 & 0 & 0 \\0 & 0 & 1 & 0 & 0 & 0\end{bmatrix} \cdot W_{temp}^{l,m}}},$where W^(l,m) represents a processing matrix for expanding two stereosignals to M′ channels, with M′>2 (for example M′=3), this matrixW^(l,m) being expressed as a 2×6 matrix of the type:

$W^{l,m} = {\begin{pmatrix}w_{11} & w_{12} \\w_{21} & w_{22} \\w_{31} & w_{32} \\w_{41} & w_{42} \\w_{51} & w_{52} \\w_{61} & w_{62}\end{pmatrix}.}$

In particular, in the aforementioned MPEG Surround standard, thecoefficients of the matrix

$\begin{bmatrix}h_{L,L}^{l,m} & h_{L,R}^{l,m} & h_{L,C}^{l,m} \\h_{R,L}^{l,m} & h_{R,R}^{l,m} & h_{R,C}^{l,m}\end{bmatrix}\quad$are such that:

$\begin{matrix}{H_{1}^{l,m} = {\begin{bmatrix}h_{L,L}^{l,m} & h_{L,R}^{l,m} & h_{L,C}^{l,m} \\h_{R,L}^{l,m} & h_{R,R}^{l,m} & h_{R,C}^{l,m}\end{bmatrix} \cdot \begin{bmatrix}1 & 0 & 0 & 0 & 0 & 0 \\0 & 1 & 0 & 0 & 0 & 0 \\0 & 0 & 1 & 0 & 0 & 0\end{bmatrix} \cdot W_{temp}^{l,m}}} \\{= {{\begin{bmatrix}1 & {P_{L,R}^{m}{\mathbb{e}}^{- {j\phi}_{R}}} & {g( {1 + {P_{L,R}^{m}{\mathbb{e}}^{- {j\phi}_{R}}}} )} & 1 & {P_{L,R_{s}}^{m}{\mathbb{e}}^{- {j\phi}_{R_{s}}}} \\{P_{L,R}^{m}{\mathbb{e}}^{- {j\phi}_{L}}} & 1 & {g( {1 + {P_{R,L}^{m}{\mathbb{e}}^{- {j\phi}_{L}}}} )} & {P_{L,R_{s}}^{m}{\mathbb{e}}^{- {j\phi}_{L_{s}}}} & 1\end{bmatrix}\begin{bmatrix}\sigma_{L}^{l,m} & 0 & 0 \\0 & \sigma_{R}^{l,m} & 0 \\0 & 0 & 1 \\\sigma_{L_{s}}^{l,m} & 0 & 0 \\0 & \sigma_{R_{s}} & 0\end{bmatrix}} \cdot \begin{bmatrix}1 & 0 & 0 & 0 & 0 & 0 \\0 & 1 & 0 & 0 & 0 & 0 \\0 & 0 & 1 & 0 & 0 & 0\end{bmatrix} \cdot W_{temp}^{l,{\kappa{(m)}}}}}\end{matrix}$

Expanding this product, we find:

$H_{1}^{l,m} = {\begin{bmatrix}{\sigma_{L}^{l,m} + \sigma_{L_{s}}^{l,m}} & {{P_{L,R}^{m}{\mathbb{e}}^{- {j\phi}_{R}}\sigma_{R}^{l,m}} + {P_{L,R_{s}}^{m}{\mathbb{e}}^{- {j\phi}_{R_{s}}}\sigma_{R_{s}}^{l,m}}} & {g( {1 + {P_{L,R}^{m}{\mathbb{e}}^{- {j\phi}_{R}}}} )} \\{{P_{R,L}^{m}{\mathbb{e}}^{- {j\phi}_{L}}\sigma_{L}^{l,m}} + {P_{R,L_{s}}^{m}{\mathbb{e}}^{- {j\phi}_{L_{s}}}\sigma_{L_{s}}^{l,m}}} & {\sigma_{R}^{l,m} + \sigma_{R_{s}}^{l,m}} & {g( {1 + {P_{L,R}^{m}{\mathbb{e}}^{- {j\phi}_{R}}}} )}\end{bmatrix} \cdot \begin{bmatrix}1 & 0 & 0 & 0 & 0 & 0 \\0 & 1 & 0 & 0 & 0 & 0 \\0 & 0 & 1 & 0 & 0 & 0\end{bmatrix} \cdot W_{temp}^{l,{\kappa{(m)}}}}$

Seeking an addition of two distinct matrices, we find:

${{{H_{1}^{l,m} = \lbrack {\begin{bmatrix}{\sigma_{L}^{l,m} + \sigma_{L_{s}}^{l,m}} & 0 & g \\0 & {\sigma_{R}^{l,m} + \sigma_{R_{s}}^{l,m}} & g\end{bmatrix} +  \quad\lbrack \begin{matrix}0 & {{P_{L,R}^{m}{\mathbb{e}}^{- {j\phi}_{R}}\sigma_{R}^{l,m}} + {P_{L,R_{s}}^{m}{\mathbb{e}}^{- {j\phi}_{R_{s}}}\sigma_{R_{s}}^{l,m}}} & {{gP}_{L,R}^{m}{\mathbb{e}}^{- {j\phi}_{R}}} \\{{P_{R,L}^{m}{\mathbb{e}}^{- {j\phi}_{L}}\sigma_{L}^{l,m}} + {P_{R,L_{s}}^{m}{\mathbb{e}}^{- {j\phi}_{L_{s}}}\sigma_{L_{s}}^{l,m}}} & {\; 0} & {{gP}_{L,R}^{m}{\mathbb{e}}^{{–j\phi}_{R}}}\end{matrix} \rbrack \rbrack}\quad }\quad}\lbrack \begin{matrix}1 & 0 & 0 & 0 & 0 & 0 \\0 & 1 & 0 & 0 & 0 & 0 \\0 & 0 & 1 & 0 & 0 & 0\end{matrix} \rbrack} \cdot W_{temp}^{l,{\kappa{(m)}}}$

which will be written hereinafter:

$H_{1}^{l,m} = {H_{DB}^{l,m} = {{\lbrack {h_{D}^{l,m} + h_{ABD}^{l,m}} \rbrack\begin{bmatrix}1 & 0 & 0 & 0 & 0 & 0 \\0 & 1 & 0 & 0 & 0 & 0 \\0 & 0 & 1 & 0 & 0 & 0\end{bmatrix}} \cdot W_{temp}^{l,{\kappa{(m)}}}}}$

with h_(D) ^(l,m) for the Downmix processing and h_(ABD) ^(l,m) for theAdditional Binaural Downmix processing.

It is possible to consider, in this embodiment, that the coefficients ofthe matrix

$\begin{bmatrix}h_{L,L}^{l,m} & h_{L,R}^{l,m} & h_{L,C}^{l,m} \\h_{R,L}^{l,m} & h_{R,R}^{l,m} & h_{R,C}^{l,m}\end{bmatrix}\quad$are indeed given by:h _(L,C) ^(l,m) =g(1+P _(L,R) ^(m) ·e ^(−jφ) ^(R) _(m) )h _(R,C) ^(l,m) =g(1+P _(R,L) ^(m) ·e ^(−jφ) ^(L) _(m) )h _(L,L) ^(l,m)=σ_(L) ^(l,m)+σ_(Ls) ^(lm)h _(L,R) ^(l,m) =P _(L,R) ^(m) e ^(−jφ) ^(R) σ_(R) ^(l,m) +P _(L,R) _(s)^(m) e ^(−jφ) ^(Rs) σ_(R) _(s) ^(l,m)h _(R,L) ^(l,m) =P _(R,L) ^(m) e ^(−jφ) ^(L) σ_(L) ^(l,m) +P _(R,L) _(s)^(m) e ^(−jφ) ^(Ls) σ_(L) _(s) ^(l,m)h _(R,R) ^(l,m)=σ_(R) ^(l,m)+σ_(R) _(s) ^(l,m)h _(L,C) ^(l,m) =g(1+P _(L,R) ^(m) ·e ^(−jφ) ^(R) ^(m) )h _(R,C) ^(l,m) =g(1+P _(R,L) ^(m) ·e ^(−jφ) ^(L) ^(m) )

as set forth previously.

It is possible to consider as a first approximation that a lateralchannel (right or left) and the corresponding rear lateral channel(right or left respectively) are mutually decorrelated. This assumptionis reasonable insofar as the rear channel in general merely takes up thehall reverberation or the like (delayed in time) of the signal of thelateral channel. In this case, the channels L and Ls and the channels Rand Rs have disjoint time frequency supports and we then have σ_(L)^(l,m)σ_(Ls) ^(l,m)=0 and σ_(R) ^(l,m)σ_(Rs) ^(l,m)=0, and:h _(L,L) ^(l,m)=σ_(L) ^(l,m)+σ_(Ls) ^(l,m)=√{square root over ((σ_(L)^(l,m)+σ_(Ls) ^(l,m))²)}=√{square root over ((σ_(L) ^(l,m))²+2*σ_(L)^(l,m)σ_(Ls) ^(l,m)+(σ_(Ls) ^(l,m))²)}{square root over ((σ_(L)^(l,m))²+2*σ_(L) ^(l,m)σ_(Ls) ^(l,m)+(σ_(Ls) ^(l,m))²)}=√{square rootover ((σ_(L) ^(l,m))²+(σ_(Ls) ^(l,m))²)}{square root over ((σ_(L)^(l,m))²+(σ_(Ls) ^(l,m))²)}h _(R,R) ^(l,m)=σ_(R) ^(l,m)+σ_(Rs) ^(l,m)=√{square root over ((σ_(R)^(l,m)+σ_(Rs) ^(l,m))²)}=√{square root over ((σ_(R) ^(l,m))²+2*σ_(R)^(l,m)σ_(Rs) ^(l,m)+(σ_(Rs) ^(l,m))²)}{square root over ((σ_(R)^(l,m))²+2*σ_(R) ^(l,m)σ_(Rs) ^(l,m)+(σ_(Rs) ^(l,m))²)}=√{square rootover ((σ_(R) ^(l,m))²+(σ_(Rs) ^(l,m))²)}{square root over ((σ_(R)^(l,m))²+(σ_(Rs) ^(l,m))²)}

On the other hand the above assumption cannot be satisfied for all thesignals. In the case where the signals were to have a common timefrequency support, it is preferable to seek to preserve the energies ofthe signals. This precaution is advocated moreover in the MPEG Surroundstandard. Indeed, the addition of signals in phase opposition (σ_(L)^(l,m)=−σ_(Ls) ^(lm)) cancels out. As indicated above, such a situationnever occurs in practice, when considering the case of a hall with areverberation effect on the Surround channels.

Nonetheless, in the example described below, variants of the aboveformulae are used to retain the energy of the signals in the Downmixprocessing, as follows:h _(L,C) ^(l,m) =g(1+P _(L,R) ^(m) ·e ^(−jφ) ^(R) ^(m) )h _(R,C) ^(l,m) =g(1+P _(R,L) ^(m) ·e ^(−jφ) ^(L) ^(m) )h _(L,L) ^(l,m)=√{square root over ((σ_(L) ^(l,m))²+(σ_(Ls)^(lm))²)}{square root over ((σ_(L) ^(l,m))²+(σ_(Ls) ^(lm))²)}

$h_{R,L}^{l,m} = {{\mathbb{e}}^{- {j{({{w_{L}^{l,m}\phi_{L}^{m}} + {w_{Ls}^{l,m}\phi_{Ls}^{m}}})}}}\sqrt{{( \sigma_{L}^{l,m} )^{2}( P_{R,L}^{m} )^{2}} + {( \sigma_{Ls}^{l,m} )^{2}( P_{R,{Ls}}^{m} )^{2}}}}$$h_{L,R}^{l,m} = {{\mathbb{e}}^{j{({{w_{R}^{l,m}\phi_{R}^{m}} + {w_{Rs}^{l,m}\phi_{Rs}^{m}}})}}\sqrt{{( \sigma_{R}^{l,m} )^{2}( P_{L,R}^{m} )^{2}} + {( \sigma_{Rs}^{l,m} )^{2}( P_{L,{Rs}}^{m} )^{2}}}}$h _(R,R) ^(l,m)=√{square root over ((σ_(R) ^(l,m))²+(σ_(Rs)^(lm))²)}{square root over ((σ_(R) ^(l,m))²+(σ_(Rs) ^(lm))²)}

The global processing matrix H₁ ^(l,k) is still expressed as the sum oftwo matrices:

$\mspace{79mu}{{H_{1}^{l,m} = {{H_{D}^{l,m} + H_{ABD}^{l,m}} = {{\lbrack {h_{D}^{l,m} + h_{ABD}^{l,m}} \rbrack\begin{bmatrix}1 & 0 & 0 & 0 & 0 & 0 \\0 & 1 & 0 & 0 & 0 & 0 \\0 & 0 & 1 & 0 & 0 & 0\end{bmatrix}} \cdot W_{temp}^{l,{\kappa{(m)}}}}}},\mspace{79mu}{{with}\text{:}}}$$H_{D}^{l,m} = {{{\quad{\begin{bmatrix}\sqrt{( \sigma_{L}^{l,m} )^{2} + ( \sigma_{L_{s}}^{l,m} )^{2}} & 0 & g \\0 & \sqrt{( \sigma_{R}^{l,m} )^{2} + ( \sigma_{R_{s}}^{l,m} )^{2}} & g\end{bmatrix}{\quad\quad}}\quad}\lbrack \begin{matrix}1 & 0 & 0 & 0 & 0 & 0 \\0 & 1 & 0 & 0 & 0 & 0 \\0 & 0 & 1 & 0 & 0 & 0\end{matrix} \rbrack} \cdot W_{temp}^{l,{\kappa{(m)}}}}$      and$\mspace{79mu}{{H_{ABD}^{l,m} = {\begin{bmatrix}0 & X_{12} & {{gP}_{L,R}^{m}{\mathbb{e}}^{- {j\phi}_{R}}} \\X_{21} & 0 & {{gP}_{R,L}^{m}{\mathbb{e}}^{- {j\phi}_{L}}}\end{bmatrix} \cdot \begin{bmatrix}1 & 0 & 0 & 0 & 0 & 0 \\0 & 1 & 0 & 0 & 0 & 0 \\0 & 0 & 1 & 0 & 0 & 0\end{bmatrix} \cdot W_{temp}^{l,{\kappa{(m)}}}}},\mspace{79mu}{{with}\text{:}}}$$\mspace{79mu}{X_{21} = {{\sqrt{{( \sigma_{L}^{l,m} )^{2}( P_{R,L}^{m} )^{2}} + {( \sigma_{L_{s}}^{l,m} )^{2}( P_{R,L_{s}}^{m} )^{2}}} \cdot {\mathbb{e}}^{- {j{({{w_{L}^{l,m}\phi_{L}^{m}} + {w_{L_{s}}^{l,m}\phi_{L_{s}}^{m}}})}}}}\mspace{14mu}{and}}}$$\mspace{79mu}{X_{12} = {\sqrt{{( \sigma_{R}^{l,m} )^{2}( P_{L,R}^{m} )^{2}} + {( \sigma_{R_{s}}^{l,m} )^{2}( P_{L,R_{s}}^{m} )^{2}}} \cdot {\mathbb{e}}^{- {j{({{w_{R}^{l,m}\phi_{R}^{m}} + {w_{R_{s}}^{l,m}\phi_{R_{s}}^{m}}})}}}}}$

The matrix H_(D) ^(l,m) does not contain any term relating to the HRTFfiltering coefficients. This matrix globally processes the operationsfor spatializing two channels (M=2) to five channels (N=5) and theoperations for sub-mixing these five channels to two channels. In aparticular embodiment in which a “Downmix” signal arising from the 5.0signals to be coded is transported, the coefficients g, w_(j), σ_(L)^(l,m), σ_(Ls) ^(l,m), σ_(R) ^(l,m), σ_(R) ^(l,m) and σ_(Rs) ^(l,m) maybe calculated by the coder so that this matrix approximates the unitmatrix. Indeed, we must have:

$\begin{matrix}{\begin{bmatrix}{\overset{\sim}{L}}_{0}^{l,m} \\{\overset{\sim}{R}}_{0}^{l,m}\end{bmatrix} = {H_{D}^{l,m}\begin{bmatrix}L_{0}^{l,m} \\R_{0}^{l,m}\end{bmatrix}}} & \;\end{matrix}$

The matrix H_(DBA) ^(l,m) consists for its part in applying filteringsbased on contralateral HRTF functions deconvolved with ipsilateralfunctions. It will be noted that the involvement of a Downmix processingdescribed hereinabove is a particular embodiment. The invention may alsobe implemented with other types of Downmix matrices.

Moreover, the embodiment introduced hereinabove is described by way ofexample. It is indeed apparent that it is not necessary, in practice, toseek to estimate the signals L₀ and R₀ by applying the matrix H_(D)^(l,m) since these signals are transmitted from the coder to thedecoder, to which these signals {tilde over (L)}₀ and {tilde over (R)}₀,and optionally the spatialization parameters, are indeed available, soas to reconstruct the signals for sound playback (optionally binaural ifthe decoder has indeed received the spatialization parameters). Thelatter embodiment exhibits two advantages. On the one hand, the numberof processing procedures to be carried out to retrieve the signals L₀and R₀ is thus reduced. On the other hand, the quality of the outputsignals is improved: passage to the transformed domain and return to thestarting domain, as well as the application of the matrix H_(D) ^(l,m),necessarily degrade the signals. An advantageous embodiment thereforeconsists in applying the following processing:

$\begin{bmatrix}{\overset{\sim}{L}}_{B}^{l,m} \\{\overset{\sim}{R}}_{B}^{l,m}\end{bmatrix} = {\begin{bmatrix}L_{0}^{l,m} \\R_{0}^{l,m}\end{bmatrix} + {H_{DBA}^{l,m}\begin{bmatrix}L_{0}^{l,m} \\R_{0}^{l,m}\end{bmatrix}}}$

It is apparent moreover that the matrix H₁ ^(l,m) can be furthersimplified. Indeed, returning to the expression:

${H_{1}^{l,m} = {\begin{bmatrix}h_{L,L}^{l,m} & h_{L,R}^{l,m} & h_{L,C}^{l,m} \\h_{R,L}^{l,m} & h_{R,R}^{l,m} & h_{R,C}^{l,m}\end{bmatrix} \cdot \begin{bmatrix}1 & 0 & 0 & 0 & 0 & 0 \\0 & 1 & 0 & 0 & 0 & 0 \\0 & 0 & 1 & 0 & 0 & 0\end{bmatrix} \cdot W_{temp}^{l,m}}},$it is possible to calculate the expressions for the five intermediatesignals with the binaural Downmix processing as follows:{tilde over (L)} ^(l,m)=σ_(L) ^(l,m)(w ₁₁ L ₀ ^(l,m) +w ₁₂ R ₀ ^(l,m)){tilde over (R)} ^(l,m)=σ_(R) ^(l,m)(w ₁₂ L ₀ ^(l,m) +w ₂₂ R ₀ ^(l,m)){tilde over (C)} ^(l,m)=σ_(C) ^(l,m)(w ₃₁ L ₀ ^(l,m) +w ₃₂ R ₀ ^(l,m)){tilde over (L)} _(s) ^(l,m)=σ_(L) _(s) ^(l,m)(w ₁₁ L ₀ ^(l,m) +w ₁₂ R ₀^(l,m)){tilde over (R)} _(s) ^(l,m)=σ_(R) _(s) ^(l,m)(w ₂₁ L ₀ ^(l,m) +w ₂₂ R ₀^(l,m))

Again with P_(L,L) ^(m)=P_(R,R) ^(m)=P_(L,L) _(s) ^(m)=P_(R,R) ^(m)=1,we obtain:{tilde over (L)} _(B) ^(l,m)=(σ_(L) ^(l,m)(w ₁₁ L ₀ ^(l,m) +w ₁₂ R ₀^(l,m))+gσ _(C) ^(l,m)(w ₃₁ L ₀ ^(l,m) +w ₃₂ R ₀ ^(l,m))+σ_(L) _(s)^(l,m)(w ₁₁ L ₀ ^(l,m) +w ₁₂ R ₀ ^(l,m)))+(σ_(R) ^(l,m)(w ₂₁ L ₀^(l,m))+gσ _(C) ^(l,m) +w ₃₂ R ₀ ^(l,m)))P _(L,R) ^(m) ·e ^(−jφ) ^(R)^(m) +σ_(R) _(s) ^(l,m)(w ₂₁ L ₀ ^(l,m) +w ₂₂ R ₀ ^(l,m))P _(L,R) _(s)^(m) ·e ^(−jφ) ^(Rs) ^(m)and{tilde over (R)} _(B) ^(l,m)=(σ_(R) ^(l,m)(w ₁₁ L ₀ ^(l,m) +w ₁₂ R ₀^(l,m))+gσ _(C) ^(l,m)(w ₃₁ L ₀ ^(l,m) +w ₃₂ R ₀ ^(l,m))+σ_(R) _(s)^(l,m)(w ₁₁ L ₀ ^(l,m) +w ₁₂ R ₀ ^(l,m)))+(σ_(L) ^(l,m)(w ₂₁ L ₀^(l,m))+gσ _(C) ^(l,m) +w ₃₂ R ₀ ^(l,m)))P _(R,L) ^(m) ·e ^(−jφ) ^(L)^(m) +σ_(L) _(s) ^(l,m)(w ₂₁ L ₀ ^(l,m) +w ₂₂ R ₀ ^(l,m))P _(R,L) _(s)^(m) ·e ^(−jφ) ^(Rs) ^(m)

Expanding these expressions, we find:{tilde over (L)} _(B) ^(l,m)=(σ_(L) ^(l,m) w ₁₁ +gσ _(C) ^(l,m) w₃₁+σ_(L) _(s) ^(l,m) w ₁₁+(σ_(R) ^(l,m) w ₂₁ +gσ _(C) ^(l,m) w ₃₁)P_(L,R) ^(m) ·e ^(−jφ) ^(R) ^(m) +σ_(R) _(s) ^(l,m) w ₂₁ P _(L,R) _(s)^(m) ·e ^(−jφ) ^(Rs) ^(m) )L ₀ ^(l,m)+(σ_(L) ^(l,m) w ₁₂ +gσ _(C) ^(l,m)w ₃₂+σ_(L) _(s) ^(l,m) w ₁₁+(σ_(R) ^(l,m) w ₂₁ +gσ _(C) ^(l,m) w ₃₁)P_(L,R) ^(m) ·e ^(−jφ) ^(R) ^(m) +σ_(R) _(s) ^(l,m) w ₂₁ P _(L,R) _(s)^(m) ·e ^(−jφ) ^(Rs) ^(m) )R ₀ ^(l,m)and{tilde over (R)} _(B) ^(l,m)=(σ_(R) ^(l,m) w ₁₁ +gσ _(C) ^(l,m) w₃₁+σ_(R) _(s) ^(l,m) w ₁₁+(σ_(L) ^(l,m) w ₂₁ +gσ _(C) ^(l,m) w ₃₁)P_(R,L) ^(m) ·e ^(−jφ) ^(L) ^(m) +σ_(R) _(s) ^(l,m) w ₂₁ P _(L,R) _(s)^(m) ·e ^(−jφ) ^(Rs) ^(m) )L ₀ ^(l,m)+(σ_(R) ^(l,m) w ₁₂ +gσ _(C) ^(l,m)w ₃₂+σ_(R) _(s) ^(l,m) w ₁₁+(σ_(L) ^(l,m) w ₂₁ +gσ _(C) ^(l,m) w ₃₁)P_(R,L) ^(m) ·e ^(−jφ) ^(L) ^(m) +σ_(L) _(s) ^(l,m) w ₂₁ P _(R,L) _(s)^(m) ·e ^(−jφ) ^(Rs) ^(m) )R ₀ ^(l,m)

These expressions are simplified with respect to their customarycalculation. It is nonetheless possible, here again, to take theprecaution not to lead to a cancellation of signals in phase oppositionby seeking to preserve the energy levels of the various signals in theDownmix processing, as advocated hereinabove. We then obtain:

${\overset{\sim}{L}}_{B}^{l,m} = {{\begin{pmatrix}{\sqrt{( {\sigma_{L}^{l,m}w_{11}} )^{2} + ( {g\;\sigma_{C}^{l,m}w_{31}} )^{2} + ( {\sigma_{L_{s}}^{l,m}w_{11}} )^{2}} +} \\{\sqrt{{( {( {\sigma_{R}^{l,m}w_{21}} )^{2} + ( {g\;\sigma_{C}^{l,m}w_{31}} )^{2}} )P_{L,R}^{m\; 2}} + ( {\sigma_{R_{s}}^{l,m}w_{21}P_{L,R_{s}}^{m}} )^{2}} \cdot {\mathbb{e}}^{- {j{({{w_{R}^{l,m}\phi_{R}^{m}} + {w_{R_{s}}^{l,m}\phi_{R_{s}}^{m}}})}}}}\end{pmatrix}L_{0}^{l,m}} + {\begin{pmatrix}{\sqrt{( {\sigma_{L}^{l,m}w_{12}} )^{2} + ( {g\;\sigma_{C}^{l,m}w_{32}} )^{2} + ( {\sigma_{L_{s}}^{l,m}w_{12}} )^{2}} +} \\{\sqrt{{( {( {\sigma_{R}^{l,m}w_{22}} )^{2} + ( {g\;\sigma_{C}^{l,m}w_{32}} )^{2}} )P_{L,R}^{m\; 2}} + ( {\sigma_{R_{s}}^{l,m}w_{22}P_{L,R_{s}}^{m}} )^{2}} \cdot {\mathbb{e}}^{- {j{({{w_{R}^{l,m}\phi_{R}^{m}} + {w_{R_{s}}^{l,m}\phi_{R_{s}}^{m}}})}}}}\end{pmatrix}R_{0}^{l,m}}}$${\overset{\sim}{R}}_{B}^{l,m} = {{\begin{pmatrix}{\sqrt{( {\sigma_{R}^{l,m}w_{21}} )^{2} + ( {g\;\sigma_{C}^{l,m}w_{31}} )^{2} + ( {\sigma_{R_{s}}^{l,m}w_{21}} )^{2}} +} \\{\sqrt{{( {( {\sigma_{L}^{l,m}w_{11}} )^{2} + ( {g\;\sigma_{C}^{l,m}w_{31}} )^{2}} )P_{R,L}^{m\; 2}} + ( {\sigma_{L_{s}}^{l,m}w_{11}P_{R,L_{s}}^{m}} )^{2}}{\mathbb{e}}^{- {j{({{w_{L}^{l,m}\phi_{L}^{m}} + {w_{L_{s}}^{l,m}\phi_{L_{s}}^{m}}})}}}}\end{pmatrix}L_{0}^{l,m}} + {\begin{pmatrix}{\sqrt{( {\sigma_{R}^{l,m}w_{22}} )^{2} + ( {g\;\sigma_{C}^{l,m}w_{32}} )^{2} + ( {\sigma_{R_{s}}^{l,m}w_{22}} )^{2}} +} \\{\sqrt{{( {( {\sigma_{L}^{l,m}w_{12}} )^{2} + ( {g\;\sigma_{C}^{l,m}w_{32}} )^{2}} )P_{R,L}^{m\; 2}} + ( {\sigma_{L_{s}}^{l,m}w_{12}P_{R,L_{s}}^{m}} )^{2}}{\mathbb{e}}^{- {j{({{w_{L}^{l,m}\phi_{L}^{m}} + {w_{L_{s}}^{l,m}\phi_{L_{s}}^{m}}})}}}}\end{pmatrix}R_{0}^{l,m}}}$ with: $w_{L}^{l,m} = \frac{( {( {\sigma_{L}^{l,m}w_{11}} )^{2} + ( {g\;\sigma_{C}^{l,m}w_{31}} )^{2}} )P_{R,L}^{m\; 2}}{{( {( {\sigma_{L}^{l,m}w_{11}} )^{2} + ( {g\;\sigma_{C}^{l,m}w_{31}} )^{2}} )P_{R,L}^{m\; 2}} + ( {\sigma_{L_{s}}^{l,m}w_{11}P_{R,L_{s}}^{m}} )^{2}}$$w_{L_{s}}^{l,m} = \frac{( {\sigma_{L_{s}}^{l,m}w_{11}P_{R,L_{s}}^{m}} )^{2}}{{( {( {\sigma_{L}^{l,m}w_{11}} )^{2} + ( {g\;\sigma_{C}^{l,m}w_{31}} )^{2}} )P_{R,L}^{m\; 2}} + ( {\sigma_{L_{s}}^{l,m}w_{11}P_{R,L_{s}}^{m}} )^{2}}$$w_{L}^{{\prime\;}^{l,m}} = \frac{( {( {\sigma_{L}^{l,m}w_{12}} )^{2} + ( {g\;\sigma_{C}^{l,m}w_{32}} )^{2}} )P_{R,L}^{m\; 2}}{{( {( {\sigma_{L}^{l,m}w_{12}} )^{2} + ( {g\;\sigma_{C}^{l,m}w_{32}} )^{2}} )P_{R,L}^{m\; 2}} + ( {\sigma_{L_{s}}^{l,m}w_{12}P_{R,L_{s}}^{m}} )^{2}}$$w_{L_{s}}^{\prime^{l,m}} = \frac{( {\sigma_{L_{s}}^{l,m}w_{12}P_{R,L_{s}}^{m}} )^{2}}{{( {( {\sigma_{L}^{l,m}w_{12}} )^{2} + ( {g\;\sigma_{C}^{l,m}w_{32}} )^{2}} )P_{R,L}^{m\; 2}} + ( {\sigma_{L_{s}}^{l,m}w_{12}P_{R,L_{s}}^{m}} )^{2}}$$w_{R}^{l,m} = \frac{( {( {\sigma_{R}^{l,m}w_{21}} )^{2} + ( {g\;\sigma_{C}^{l,m}w_{31}} )^{2}} )P_{L,R}^{m\; 2}}{{( {( {\sigma_{R}^{l,m}w_{21}} )^{2} + ( {g\;\sigma_{C}^{l,m}w_{31}} )^{2}} )P_{L,R}^{m\; 2}} + ( {\sigma_{R_{s}}^{l,m}w_{21}P_{L,R_{s}}^{m}} )^{2}}$$w_{R_{s}}^{l,m} = \frac{( {\sigma_{R_{s}}^{l,m}w_{21}P_{L,R_{s}}^{m}} )^{2}}{{( {( {\sigma_{R}^{l,m}w_{21}} )^{2} + ( {g\;\sigma_{C}^{l,m}w_{31}} )^{2}} )P_{L,R}^{m\; 2}} + ( {\sigma_{R_{s}}^{l,m}w_{21}P_{L,R_{s}}^{m}} )^{2}}$$w_{R}^{\prime^{l,m}} = \frac{( {( {\sigma_{R}^{l,m}w_{22}} )^{2} + ( {g\;\sigma_{C}^{l,m}w_{32}} )^{2}} )P_{L,R}^{m\; 2}}{{( {( {\sigma_{R}^{l,m}w_{22}} )^{2} + ( {g\;\sigma_{C}^{l,m}w_{32}} )^{2}} )P_{L,R}^{m\; 2}} + ( {\sigma_{R_{s}}^{l,m}w_{22}P_{L,R_{s}}^{m}} )^{2}}$$w_{R_{s}}^{\prime^{l,m}} = \frac{( {\sigma_{R_{s}}^{l,m}w_{22}P_{L,R_{s}}^{m}} )^{2}}{{( {( {\sigma_{R}^{l,m}w_{22}} )^{2} + ( {g\;\sigma_{C}^{l,m}w_{32}} )^{2}} )P_{L,R}^{m\; 2}} + ( {\sigma_{R_{s}}^{l,m}w_{22}P_{L,R_{s}}^{m}} )^{2}}$

The expression for the matrix H₁ ^(l,m) is then as follows:

$\begin{matrix}{H_{1}^{l,m} = \begin{bmatrix}{\sqrt{( {\sigma_{L}^{l,m}w_{11}} )^{2} + ( {g\;\sigma_{C}^{l,m}w_{31}} )^{2} + ( {\sigma_{L_{s}}^{l,m}w_{11}} )^{2}} +} & {\sqrt{( {\sigma_{L}^{l,m}w_{12}} )^{2} + ( {g\;\sigma_{C}^{l,m}w_{32}} )^{2} + ( {\sigma_{L_{s}}^{l,m}w_{12}} )^{2}} +} \\\begin{matrix}{\sqrt{{( {( {\sigma_{R}^{l,m}w_{21}} )^{2} + ( {g\;\sigma_{C}^{l,m}w_{31}} )^{2}} )P_{L,R}^{m\; 2}} + ( {\sigma_{R_{s}}^{l,m}w_{21}P_{L,R_{s}}^{m}} )^{2\;}} \cdot} \\{\mathbb{e}}^{- {j{({{w_{R}^{l,m}\phi_{R}^{m}} + {w_{R_{s}}^{l,m}\phi_{R_{s\;}}^{m}}})}}}\end{matrix} & \begin{matrix}{\sqrt{{( {( {\sigma_{R}^{l,m}w_{22}} )^{2} + ( {g\;\sigma_{C}^{l,m}w_{32}} )^{2}} )P_{L,R}^{m\; 2}} + ( {\sigma_{R_{s}}^{l,m}w_{22}P_{L,R_{s}}^{m}} )^{2\;}} \cdot} \\{\mathbb{e}}^{- {j{({{{w^{\prime}}_{R}^{l,m}\phi_{R}^{m}} + {{w^{\prime}}_{R_{s}}^{l,m}\phi_{R_{s}}^{m}}})}}}\end{matrix} \\{\sqrt{( {\sigma_{R}^{l,m}w_{21}} )^{2} + ( {g\;\sigma_{C}^{l,m}w_{31}} )^{2} + ( {\sigma_{R_{s}}^{l,m}w_{21}} )^{2}} +} & {\sqrt{( {\sigma_{R}^{l,m}w_{22}} )^{2} + ( {g\;\sigma_{C}^{l,m}w_{32}} )^{2} + ( {\sigma_{R_{s}}^{l,m}w_{22}} )^{2\;}} +} \\\begin{matrix}\sqrt{{( {( {\sigma_{L}^{l,m}w_{11}} )^{2} + ( {g\;\sigma_{C}^{l,m}w_{31}} )^{2}} )P_{R,L}^{m\; 2}} + ( {\sigma_{L_{s}}^{l,m}w_{11}P_{R,L_{s}}^{m}} )^{2}} \\{\mathbb{e}}^{- {j{({{w_{L}^{l,m}\phi_{L_{s}}^{m}} + {w_{L_{s}}^{l,m}\phi_{L_{s}}^{m}}})}}}\end{matrix} & \begin{matrix}\sqrt{{( {( {\sigma_{L}^{l,m}w_{12}} )^{2} + ( {g\;\sigma_{C}^{l,m}w_{32}} )^{2}} )P_{R,L}^{m\; 2}} + ( {\sigma_{L_{s}}^{l,m}w_{12}P_{{R,L_{s}}\;}^{m}} )^{2}} \\{\mathbb{e}}^{- {j{({{{w^{\prime}}_{L}^{l,m}\phi_{L}^{m}} + {{w^{\prime}}_{L_{s}}^{l,m}\phi_{L_{s}}^{m}}})}}}\end{matrix}\end{bmatrix}} & \;\end{matrix}$

Of course, the present invention is not limited to the embodimentdescribed hereinabove by way of example; it extends to other variants.

Thus, described hereinabove is the case of a processing of two initialstereo signals to be encoded and spatialized to binaural stereo, passingvia a 5.1 spatialization. Nonetheless, the invention applies moreover tothe processing of an initial mono signal (case where N=1 in the generalexpression N>0 given hereinabove and applying to the number of initialchannels to be processed). Returning for example to the case of thestandard “Information technology—MPEG audio technologies—Part 1: MPEGSurround”, ISO/BEC JTC 1/SC 29 (21 Jul. 2006), the equations exhibitedin point 6.11.4.1.3.1, for the case of a first processing of the typemono—5.1 spatialization—binauralization (denoted “5-1-5_(i)” andconsisting in processing from the outset the surround tracks before thecentral track), simplify to:

(σ_(L)^(l, m))² = (σ_(L)^(l, m))² + (σ_(C)^(l, m)g)² + (σ_(Ls)^(l, m))² + (P_(L, R)^(l, m))²((σ_(R)^(l, m))² + (σ_(C)^(l, m)g)²) + (P_(L, Rs)^(l, m))²(σ_(Rs)^(l, m))² + …  2P_(L, R)^(l, m)ρ_(R)^(m)(σ_(L)^(l, m)σ_(R)^(l, m)ICC₃^(l, m) + (σ_(C)^(l, m)g)²)cos (ϕ_(R)^(m)) + …  2P_(L, Rs)^(l, m)ρ_(Rs)^(m)σ_(Ls)^(l, m)σ_(Rs)^(l, m)ICC₂^(l, m)cos (ϕ_(Rs)^(m))(σ_(R)^(l, m))² = (P_(R, L)^(l, m))²((σ_(L)^(l, m))² + (σ_(C)^(l, m)g)²) + (σ_(C)^(l, m)g)² + (P_(R, Ls)^(l, m))²(σ_(Ls)^(l, m))² + (σ_(R)^(l, m))² + (σ_(Rs)^(l, m))² + …  2P_(R, L)^(l, m)ρ_(L)^(m)(σ_(L)^(l, m)σ_(R)^(l, m)ICC₃^(l, m) + (σ_(C)^(l, m)g)²)cos (ϕ_(L)^(m)) + …  2P_(R, Ls)^(l, m)ρ_(Ls)^(m)σ_(Ls)^(l, m)σ_(Rs)^(l, m)ICC₂^(l, m)cos (ϕ_(Ls)^(m))  and  ⟨L_(B)R_(B)^(*)⟩^(l, m) = ((σ_(L)^(l, m))² + (g σ_(C)^(l, m))²)P_(R, L)^(l, m)ρ_(L)^(m)exp (j ϕ_(L)) + …  ((σ_(R)^(l, m))² + (g σ_(C)^(l, m))²)P_(L, R)^(l, m)ρ_(R)^(m)exp (j ϕ_(R)) + …  (σ_(Ls)^(l, m))²P_(R, Ls)^(l, m) ρ_(C)^(m)exp (j ϕ_(Ls)) + …  (σ_(Rs)^(l, m))²P_(L, Rs)^(l, m)ρ_(Rs)^(m)exp (j ϕ_(Rs)) + …  (σ_(L)^(l, m)σ_(R)^(l, m)ICC₃^(l, m) + (g σ_(C)^(l, m))²) + …  σ_(Ls)^(l, m)σ_(Rs)^(l, m)ICC₂^(l, m) + …  P_(L, R)^(l, m)P_(R, L)^(l, m)(σ_(L)^(l, m)σ_(R)^(l, m)ICC₃^(l, m) + (g σ_(C)^(l, m))²)ρ_(L)^(m)ρ_(R)^(m)exp (j(ϕ_(R)^(m) + ϕ_(L)^(m))) + …  P_(L, Rs)^(l, m)P_(R, Ls)^(l, m)σ_(Ls)^(l, m)σ_(Rs)^(l, m)ICC₃^(l, m)ρ_(Ls)^(m)ρ_(Rs)^(m)exp (j(ϕ_(Rs)^(m) + ϕ_(Ls)^(m)))

Likewise, the equations presented in point 6.11.4.1.3.2, for the case ofa first processing of the type mono—5.1 spatialization—binauralization(denoted “5-1-5₂” and consisting in processing from the outset thecentral track, and then in processing the surround effect on each track,left and right), simplify to:

(σ_(L)^(l, m))² = (σ_(L)^(l, m))² + (σ_(C)^(l, m)g)² + (σ_(Ls)^(l, m))² + (P_(L, R)^(l, m))²((σ_(R)^(l, m))² + (σ_(C)^(l, m)g)²) + (P_(L, Rs)^(l, m))² + …  2P_(L, R)^(l, m)ρ_(R)^(m)(σ_(L)^(l, m)σ_(R)^(l, m)ICC₁^(l, m) + (σ_(C)^(l, m)g)²)cos (ϕ_(R)^(m)) + …  2P_(L, Rs)^(l, m)ρ_(Rs)^(m)σ_(Ls)^(l, m)σ₁^(l, m)cos (ϕ_(Rs)^(m))(σ_(R)^(l, m))² = (P_(R, L)^(l, m))²((σ_(L)^(l, m))² + (σ_(C)^(l, m)g)²) + (σ_(C)^(l, m)g)² + (P_(R, Ls)^(l, m))²(σ_(Ls)^(l, m))² + (σ_(R)^(l, m))² + (σ_(Rs)^(l, m))² + …  2P_(R, L)^(l, m)ρ_(L)^(m)(σ_(L)^(l, m)σ_(R)^(l, m)ICC₁^(l, m) + (σ_(C)^(l, m)g)²)cos (ϕ_(L)^(m)) + …  2P_(R, Ls)^(l, m)ρ_(Ls)^(m)σ_(Ls)^(l, m)σ_(Rs)^(l, m)ICC₁^(l, m)cos (ϕ_(Ls)^(m))  and  ⟨L_(B)R_(B)^(*)⟩^(l, m) = ((σ_(L)^(l, m))² + (g σ_(C)^(l, m))²)P_(R, L)^(l, m)ρ_(L)^(m)exp (j ϕ_(L)) + …  ((σ_(R)^(l, m))² + (g σ_(C)^(l, m))²)P_(L, R)^(l, m)ρ_(R)^(m)exp (j ϕ_(R)) + …  (σ_(Ls)^(l, m))²P_(R, Ls)^(l, m)ρ_(C)^(m)exp (j ϕ_(Ls)) + …  (σ_(Rs)^(l, m))²P_(L, Rs)^(l, m)ρ_(Rs)^(m)exp (j ϕ_(Rs)) + …  (σ_(L)^(l, m)σ_(R)^(l, m)ICC₃^(l, m) + (g σ_(C)^(l, m))²) + …  σ_(Ls)^(l, m)σ_(Rs)^(l, m)ICC₁^(l, m) + …  P_(L, R)^(l, m)P_(R, L)^(l, m)(σ_(L)^(l, m)σ_(R)^(l, m)ICC₁^(l, m) + (g σ_(C)^(l, m))²)ρ_(L)^(m)ρ_(R)^(m)exp (j(ϕ_(R)^(m) + ϕ_(L)^(m))) + …  P_(L, Rs)^(l, m)P_(R, Ls)^(l, m)σ_(Ls)^(l, m)σ_(Rs)^(l, m)ICC₁^(l, m)ρ_(Ls)^(m)ρ_(Rs)^(m)exp (j (ϕ_(Rs)^(m) + ϕ_(Ls)^(m)))

More generally, provision may be made for other processing procedures ofthe signals or of components of signals intended to be played back inbinaural or transaural format. For example, the tracks S_(G) and S_(D)of FIG. 4B can furthermore undergo a dynamic low-pass filtering ofDolby® type or the like.

The present invention is also aimed at a module MOD (FIG. 4B) forprocessing sound data, for passing from a multi-channel format to abinaural or transaural format, in the transformed domain, whose elementscould be those illustrated in FIG. 4B. Such a module then comprisesprocessing means, such as a processor PROC and a work memory MEM, forthe implementation of the invention. It may be built into any type ofdecoder, in particular of a device for sound playback (PC computer,personal stereo, mobile telephone, or the like) and optionally for filmviewing. As a variant, the module may be designed to operate separatelyfrom the playback, for example to prepare contents in the binaural ortransaural format, with a view to subsequent decoding.

The present invention is also aimed at a computer program, downloadablevia a telecommunication network and/or stored in a memory of aprocessing module of the aforementioned type and/or stored on a memorymedium intended to cooperate with a reader of such a processing module,and comprising instructions for the implementation of the invention,when they are executed by a processor of said module.

The invention claimed is:
 1. A method for processing sound data encodedin a sub-band domain, for dual-channel playback of binaural orTransaural® type, wherein a matrix filtering is applied so as to passfrom a sound representation with N channels with N>0, to a dual-channelrepresentation, said sound representation with N channels consisting inconsidering N virtual loudspeakers surrounding the head of a listener,and, for each virtual loudspeaker of at least some of the loudspeakers:a first transfer function specific to an ipsilateral path from theloudspeaker to a first ear of the listener, facing the loudspeaker, anda second transfer function specific to a contralateral path from saidloudspeaker to the second ear of the listener, masked from theloudspeaker by the listener's head, the matrix filtering appliedcomprising a multiplicative coefficient defined by the spectrum, in thesub-band domain, of the second transfer function deconvolved with thefirst transfer function, wherein a matrix filtering is applied so as topass from a sound representation with M channels, with M>0, to adual-channel representation, by passing through an intermediaterepresentation on said N channels, with N>2, and wherein thecoefficients of the matrix are expressed, for a contralateral path, atleast as a function of respective spatialization gains of the M channelson the N virtual loudspeakers situated in a hemisphere around a firstear, and of the spectra of the contralateral transfer function, relatingto the second ear of the listener, deconvolved with the ipsilateraltransfer function, relating to the first ear, while, for an ipsilateralpath, the coefficients of the matrix are expressed as a function ofspatialization gains of the M channels on the N virtual loudspeakerssituated in a hemisphere around a first ear, and wherein therepresentation with N channels comprises, per hemisphere around an ear,at least one direct virtual loudspeaker and one ambience virtualloudspeaker, the coefficients of the matrix being expressed, in asub-band domain as time-frequency transform, by: h_(L,C)^(l,m)=g(1+P_(L,R) ^(m)·e^(−jφ) ^(R) ^(m) ), for the paths from acentral virtual loudspeaker to the left ear, h_(R,C) ^(l,m)=g(1+P_(R,L)^(m)·e^(−jφ) ^(L) ^(m) ), for the paths from a central virtualloudspeaker to the right ear,${h_{L,R}^{l,m} = {{\mathbb{e}}^{j{({{w_{R}^{l,m}\phi_{R}^{m}} + {w_{Rs}^{l,m}\phi_{Rs}^{m}}})}}\sqrt{{( \sigma_{R}^{l,m} )^{2}( P_{L,R}^{m} )^{2}} + {( \sigma_{Rs}^{l,m} )^{2}( P_{L,{Rs}}^{m} )^{2}}}}},$for the contralateral paths to the left ear;${h_{R,L}^{l,m} = {{\mathbb{e}}^{- {j{({w_{L}^{l,m},{\phi_{L}^{m} + {w_{Ls}^{l,m}\phi_{Ls}^{m}}}})}}}\sqrt{{( \sigma_{L}^{l,m} )^{2}( P_{R,L}^{m} )^{2}} + {( \sigma_{Ls}^{l,m} )^{2}( P_{R,{Ls}}^{m} )^{2\;}}}}},$for the contralateral paths to the right ear; h_(L,L) ^(l,m)=√{squareroot over ((σ_(L) ^(l,m))²+(σ_(Ls) ^(lm))²)}{square root over ((σ_(L)^(l,m))²+(σ_(Ls) ^(lm))²)}, for the ipsilateral paths to the left ear;h_(R,R) ^(l,m)=√{square root over ((σ_(L) ^(l,m))²+(σ_(Ls)^(lm))²)}{square root over ((σ_(L) ^(l,m))²+(σ_(Ls) ^(lm))²)}, for theipsilateral paths to the right ear; where: g is a mixing apportionmentgain from a central virtual loudspeaker channel to left and right directloudspeaker channels, σ_(L) ^(l,m) and σ_(Ls) ^(l,m) represent relativegains to be applied to one and the same first signal so as to definechannels L and Ls respectively of the left direct and left ambiencevirtual loudspeakers, for sample l of frequency band m in time-frequencytransform, σ_(R) ^(l,m) or σ_(Rs) ^(l,m) represent relative gains to beapplied to one and the same second signal so as to define channels R andRs of the right direct and right ambience virtual loudspeakers, forsample l of frequency band m in time-frequency transform, P_(R,L) ^(m)or P_(R,Ls) ^(m) is the expression for the spectrum of the transferfunction of contralateral HRTF type, relating to the right ear of thelistener, deconvolved with an ipsilateral transfer function, relating tothe left ear, for a direct or respectively ambience, left virtualloudspeaker, P_(L,R) ^(m) or P_(L,Rs) ^(m) is the expression for thespectrum of the transfer function of contralateral HRTF type, relatingto the left ear of the listener, deconvolved with an ipsilateraltransfer function, relating to the right ear, for a direct orrespectively ambience, right virtual loudspeaker, φ_(L) ^(m), φ_(Ls)^(m), φ_(R) ^(m) and φ_(Rs) ^(m) are phase shifts between contralateraland ipsilateral transfer functions corresponding to chosen interauraldelays, and w_(L) ^(l,m), w_(Ls) ^(l,m), w_(R) ^(l,m) and w_(Rs) ^(l,m)are chosen weightings.
 2. The method as claimed in claim 1, wherein thecoefficients of the matrix vary as a function of frequency, according toa weighting of a chosen factor less than one, if the frequency is lessthan a chosen threshold, and of one otherwise.
 3. The method as claimedin claim 2, wherein the factor is about 0.5 and the chosen frequencythreshold is about 500 Hz so as to eliminate a coloration distortion. 4.The method as claimed in claim 1, wherein a chosen gain is furthermoreapplied to two signals, left track and right track, in dual-channelrepresentation, before playback, the chosen gain being controlled so asto limit an energy of the left track and right track signals, to themaximum, to an energy of signals of the virtual loudspeakers.
 5. Themethod as claimed in claim 4, wherein the coefficients of the matrixvary as a function of frequency, according to a weighting of a chosenfactor less than one, if the frequency is less than a chosen threshold,and of one otherwise, and wherein an automatic gain control is appliedto the two signals, left track and right track, downstream of theapplication of the frequency-variable weighting factor.
 6. The method asclaimed in claim 1, wherein the matrix filtering is expressed accordingto a product of matrices of type: ${H_{1}^{l,k} = {\begin{bmatrix}h_{L,L}^{l,m} & h_{L,R}^{l,m} & h_{L,C}^{l,m} \\h_{R,L}^{l,m} & h_{R,R}^{l,m} & h_{R,C}^{l,m}\end{bmatrix} \cdot \begin{bmatrix}1 & 0 & 0 & 0 & 0 & 0 \\0 & 1 & 0 & 0 & 0 & 0 \\0 & 0 & 1 & 0 & 0 & 0\end{bmatrix} \cdot W_{temp}^{l,{\kappa{(k)}}}}},{0 \leq k < K},{0 \leq l < L},$where: W^(l,m) represents a processing matrix for expanding stereosignals to M′ channels, with M′>2, and $\begin{bmatrix}h_{L,L}^{l,m} & h_{L,R}^{l,m} & h_{L,C}^{l,m} \\h_{R,L}^{l,m} & h_{R,R}^{l,m} & h_{R,C}^{l,m}\end{bmatrix} \cdot \begin{bmatrix}1 & 0 & 0 & 0 & 0 & 0 \\0 & 1 & 0 & 0 & 0 & 0 \\0 & 0 & 1 & 0 & 0 & 0\end{bmatrix}$ represents a global matrix processing comprising: aprocessing for expanding M′ channels to said N channels, with N>3, and aprocess for spatializing the N virtual loudspeakers respectivelyassociated with the N channels so as to obtain a binaural orTransaural®, dual-channel representation, with:h _(L,C) ^(l,m) =g(1+P _(R,L) ^(m) ·e ^(−jφ) ^(L) ^(m) ),h _(R,C) ^(l,m)=g(1+P _(R,L) ^(m) ·e ^(−jφ) ^(L) ^(m) ), $\begin{matrix}{{h_{L,R}^{l,m} = {{\mathbb{e}}^{j{({{w_{R}^{l,m}\phi_{R}^{m}} + {w_{Rs}^{l,m}\phi_{Rs}^{m}}})}}\sqrt{{( \sigma_{R}^{l,m} )^{2}( P_{L,R}^{m} )^{2}} + {( \sigma_{Rs}^{l,m} )^{2}( P_{L,{Rs}}^{m} )^{2}}}}},{h_{R,L}^{l,m} = {{\mathbb{e}}^{- {j{({{w_{L}^{l,m}\phi_{L}^{m}} + {w_{Ls}^{l,m}\phi_{Ls}^{m}}})}}}\sqrt{{( \sigma_{L}^{l,m} )^{2}( P_{R,L}^{m} )^{2}} + {( \sigma_{Ls}^{l,m} )^{2}( P_{R,{Ls}}^{m} )^{2}}}}},} & \;\end{matrix}$h _(L,L) ^(l,m)=√{square root over ((σ_(L) ^(l,m))²+(σ_(Ls)^(lm))²)}{square root over ((σ_(L) ^(l,m))²+(σ_(Ls) ^(lm))²)} and h_(R,R) ^(l,m)=√{square root over ((σ_(R) ^(l,m))²+(σ_(Ls)^(lm))²)}{square root over ((σ_(R) ^(l,m))²+(σ_(Ls) ^(lm))²)}.
 7. Themethod as claimed in claim 1, wherein the matrix filtering consists inapplying: a first processing for sub-mixing the N channels to two stereosignals, and a second processing leading, when it is executed jointlywith the first processing, to a spatialization of the N virtualloudspeakers respectively associated with the N channels so as to obtaina binaural or Transaural®, dual-channel representation.
 8. The method asclaimed in claim 7, wherein a weighting of the second processing in saidmatrix filtering is chosen.
 9. The method as claimed in claim 8, whereinthe first processing is applied in a coder communicating with a decoder,and the second processing is applied in said decoder.
 10. The method asclaimed in claim 6, wherein the matrix filtering consists in applying: afirst processing for sub-mixing the N channels to two stereo signals,and a second processing leading, when it is executed jointly with thefirst processing, to a spatialization of the N virtual loudspeakersrespectively associated with the N channels so as to obtain a binauralor Transaural®, dual-channel representation, and wherein the matrix:${H_{1}^{l,k} = {\begin{bmatrix}h_{L,L}^{l,m} & h_{L,R}^{l,m} & h_{L,C}^{l,m} \\h_{R,L}^{l,m} & h_{R,R}^{l,m} & h_{R,C}^{l,m}\end{bmatrix} \cdot \begin{bmatrix}1 & 0 & 0 & 0 & 0 & 0 \\0 & 1 & 0 & 0 & 0 & 0 \\0 & 0 & 1 & 0 & 0 & 0\end{bmatrix} \cdot W_{temp}^{l,{\kappa{(k)}}}}},$ is written as a sumof matrices H₁ ^(l,m)=H_(D) ^(l,m)+H_(ABD) ^(l,m), with: a first matrixrepresenting the first processing being expressed by:$H_{D\;}^{l,m} = {\begin{bmatrix}\sqrt{( \sigma_{L}^{l,m} )^{2} + ( \sigma_{L_{s}}^{l,m} )^{2\;}} & 0 & g \\0 & \sqrt{{( \sigma_{R}^{l,m} )^{2} + ( \sigma_{R_{s}}^{l,m} )^{2}}\;} & g\end{bmatrix}{\quad{\begin{bmatrix}1 & 0 & 0 & 0 & 0 & 0 \\0 & 1 & 0 & 0 & 0 & 0 \\0 & 0 & 1 & 0 & 0 & 0\end{bmatrix} \cdot W_{temp}^{l,{\kappa{(k)}}}}}}$ and a second matrixrepresenting the second processing being expressed by:${H_{ABD}^{l,m} = {\begin{bmatrix}0 & X_{12} & {{gP}_{L,R}^{m}{\mathbb{e}}^{- {j\phi}_{R}}} \\X_{21} & 0 & {{gP}_{R,L}^{m}{\mathbb{e}}^{{- {j\phi}_{L}}\;}}\end{bmatrix} \cdot \begin{bmatrix}1 & 0 & 0 & 0 & 0 & 0 \\0 & 1 & 0 & 0 & 0 & 0 \\0 & 0 & 1 & 0 & 0 & 0\end{bmatrix} \cdot W_{temp}^{l,{\kappa{(m)}}}}},{with}$$X_{21} = {\sqrt{{( \sigma_{L}^{l,m} )^{2}( P_{R,L}^{m} )^{2}} + {( \sigma_{L_{s}}^{l,m} )^{2}( P_{R,L_{s}}^{m} )^{2}}} \cdot {\mathbb{e}}^{- {j{({{w_{L}^{l,m}\phi_{L}^{m}} + {w_{L_{s}}^{l,m}\phi_{L_{s}}^{m}}})}}}}$and$X_{12} = {\sqrt{{( \sigma_{R}^{l,m} )^{2}( P_{L,R}^{m} )^{2}} + {( \sigma_{R_{s}}^{l,m} )^{2}( P_{L,R_{s}}^{m} )^{2}}} \cdot {{\mathbb{e}}^{- {j{({{w_{R}^{l,m}\phi_{R}^{m}} + {w_{R_{s}}^{l,m}\phi_{R_{s}}^{m}}})}}}.}}$11. A non-transitory computer program product comprising instructionsfor the implementation of the method as claimed in claim 1, when thisprogram is executed by a processor.
 12. A module for processing sounddata encoded in a sub-band domain, for dual-channel playback of binauralor Transaural® type, the module comprising means for applying a matrixfiltering so as to pass from a sound representation with N channels withN>0, to a dual-channel representation, said sound representation with Nchannels consisting in considering N virtual loudspeakers surroundingthe head of a listener, and, for each virtual loudspeaker of at leastsome of the loudspeakers: a first transfer function specific to anipsilateral path from the loudspeaker to a first ear of the listener,facing the loudspeaker, and a second transfer function specific to acontralateral path from said loudspeaker to the second ear of thelistener, masked from the loudspeaker by the listener's head, the matrixfiltering applied comprising a multiplicative coefficient defined by thespectrum, in the sub-band domain, of the second transfer functiondeconvolved with the first transfer function, and the module furthercomprising means for applying a matrix filtering so as to pass from asound representation with M channels, with M>0, to a dual-channelrepresentation, by passing through an intermediate representation onsaid N channels, with N>2, and wherein the coefficients of the matrixare expressed, for a contralateral path, at least as a function ofrespective spatialization gains of the M channels on the N virtualloudspeakers situated in a hemisphere around a first ear, and of thespectra of the contralateral transfer function, relating to the secondear of the listener, deconvolved with the ipsilateral transfer function,relating to the first ear, while, for an ipsilateral path, thecoefficients of the matrix are expressed as a function of spatializationgains of the M channels on the N virtual loudspeakers situated in ahemisphere around a first ear, and wherein the representation with Nchannels comprises, per hemisphere around an ear, at least one directvirtual loudspeaker and one ambience virtual loudspeaker, thecoefficients of the matrix being expressed, in a sub-band domain astime-frequency transform, by: h_(L,C) ^(l,m) =g(1+P _(L,R) ^(m) ·e^(−jφ) ^(R) ^(m) ),for the paths from a central virtual loudspeaker tothe left ear, h_(R,C) ^(l,m) =g(1+P _(L,R) ^(m) ·e ^(−jφ) ^(R) ^(m) ),for the paths from a central virtual loudspeaker to the right ear,${h_{L,R}^{l,m}{\mathbb{e}}^{j{({{w_{R}^{l,m}\phi_{R}^{m}} + {w_{Rs}^{l,m}\phi_{Rs}^{m}}})}}\sqrt{{( \sigma_{R}^{l,m} )^{2}( P_{L,R}^{m} )^{2}} + {( \sigma_{Rs}^{l,m} )^{2}( P_{L,{Rs}}^{m} )^{2}}}},$for the contralateral paths to the left ear;${h_{L,R}^{l,m}{\mathbb{e}}^{- {j{({{w_{L}^{l,m}\phi_{L}^{m}} + {w_{Ls}^{l,m}\phi_{Ls}^{m}}})}}}\sqrt{{( \sigma_{L}^{l,m} )^{2}( P_{R,L}^{m} )^{2}} + {( \sigma_{Ls}^{l,m} )^{2}( P_{R,{Ls}}^{m} )^{2}}}},$for the contralateral paths to the right ear; h_(L,L) ^(l,m)=√{squareroot over ((σ_(L) ^(l,m))²+(σ_(Ls) ^(lm))²)}{square root over ((σ_(L)^(l,m))²+(σ_(Ls) ^(lm))²)}, for the ipsilateral paths to the left ear;h_(R,R) ^(l,m)=√{square root over ((σ_(R) ^(l,m))²+(σ_(Rs)^(lm))²)}{square root over ((σ_(R) ^(l,m))²+(σ_(Rs) ^(lm))²)}, for theipsilateral paths to the right ear; where: g is a mixing apportionmentgain from a central virtual loudspeaker channel to left and right directloudspeaker channels, σ_(L) ^(l,m) and σ_(Ls) ^(l,m) represent relativegains to be applied to one and the same first signal so as to definechannels L and Ls respectively of the left direct and left ambiencevirtual loudspeakers, for sample l of frequency band m in time-frequencytransform, σ_(R) ^(l,m) or σ_(Rs) ^(l,m) represent relative gains to beapplied to one and the same second signal so as to define channels R andRs of the right direct and right ambience virtual loudspeakers, forsample l of frequency band m in time-frequency transform, P_(R,L) ^(m)or P_(R,Ls) ^(m) is the expression for the spectrum of the transferfunction of contralateral HRTF type, relating to the right ear of thelistener, deconvolved with an ipsilateral transfer function, relating tothe left ear, for a direct or respectively ambience, left virtualloudspeaker, P_(L,R) ^(m) or P_(L,Rs) ^(m) is the expression for thespectrum of the transfer function of contralateral HRTF type, relatingto the left ear of the listener, deconvolved with an ipsilateraltransfer function, relating to the right ear, for a direct orrespectively ambience, right virtual loudspeaker, φ_(L) ^(m), φ_(Ls)^(m), φ_(R) ^(m) and φ_(Rs) ^(m) are phase shifts between contralateraland ipsilateral transfer functions corresponding to chosen interauraldelays, and w_(L) ^(l,m), w_(Ls) ^(l,m), w_(R) ^(l,m) and w_(Rs) ^(l,m)are chosen weightings.
 13. The module as claimed in claim 12, furthercomprising decoding means of MPEG Surround® type.